Supplements

This is a Quarto website that I created for ‘A Practical Guide . . .’

Code and Supplements for “A Practical Guide to Data Analysis Using R”

Here is made available code and other supplementary content that relates to the new Cambridge University Press text:
“A Practical Guide to Data Analysis Using R – An Example-Based Approach”,
by John H Maindonald, W John Braun, and Jeffrey L Andrews.1

The new text builds on “Data Analysis and Graphics Using R” (Maindonald and Braun, CUP, 3rd edn, 2010.)

Solutions to selected exercises

Corrections

The following is designed to replace Exercise 3 in Chapter 8 from the published text, where the second sentence refers to a non-existent Chapter 3 model fit;

8.3. Use qqnorm() to check differences from normality in nsw74psid1::re78. What do you notice? Use tree-based regression to predict re78, and check differences from normality in the distribution of residuals.
What do you notice about the tails of the distribution?

  1. Use the function car::powerTransform() with family='bcnPower' to search for a transformation that will bring the distribution of re78 closer to normality. Run summary on the output to get values (suitably rounded values are usually preferred) of lambda and gamma that you can then supply as arguments to car::bcnPower() to obtain transformed values tran78 of re78. Use qqnorm() with the transformed data to compare its distribution with a normal distribution. The distribution should now be much closer to normal, making the choice of splits that maximize the between-groups sum-of-squares sums of squares about the mean a more optimal procedure.

  2. Use tree-based regression to predict tran78, and check differences from normality in the distribution of residuals. What do you now notice about the tails of the distribution? What are the variable importance ranks i) if the tree is chosen that gives the minimum cross-validated error; ii) if the tree is chosen following the one standard error criterion? In each case, calculate the cross-validated relative error.

  3. Do a random forest fit to the transformed data, and compare the bootstrap error with then cross-validated error from the rpart fits.

Footnotes

  1. The new text builds on “Data Analysis and Graphics Using R” (Maindonald and Braun, CUP, 3rd edn, 2010.)↩︎