13 Association Plots in R

Bivariate Analysis - single numeric outcome against categorical dependent variable

For illustration, we will use ggplot2::diamonds dataset.

> diamonds = ggplot2::diamonds

> price.means = aggregate(price ~ cut, diamonds, mean)

> head(price.means)

Plot mean price

> barplot(price.means$price, names.arg = price.means$cut)

Let's make the visual better

  • It is a good idea to sort the data before plotting

  • Add title

  • Add x and y labels

  • Add a color to the bars

> price.means = price.means[order(price.means$price, decreasing = TRUE),]

> barplot(price.means$price,

names.arg = price.means$cut,

col = "Steelblue",

xlab = "Cut",

ylab = "Avg Price",

main = "Association plot between Cut and Average Price",

border = "white")

Distribution of Outcome (e.g. Price) against Common Categorical Variable

We are going to use grouped boxplot. For illustration, we will use built-in mtcars dataset.

> data(mtcars)

> boxplot(mpg ~ cyl, data = mtcars)

Let's modify this plot

  1. Add x and y labels

  2. Add Title

  3. Add color to each group to differentiate

> boxplot(mpg ~ cyl,

data = mtcars,

col = brewer.pal(3, "Paired"),

xlab = "No of Cylinder",

ylab = "Mileage",

main = "Mileage by Number of Cylinders \n mtcars dataset",

outpch = 16,

outcol = brewer.pal(3, "Paired"),

staplelty = 0,

whisklty = 1,

#name = c("", "")


For more practice use MASS::painters dataset and plot Expression ~ School

Scatter Plot

Loading package for loading data from web.

> require(RCurl)

Load Pearson's Height dataset

> url = "http://www.math.uah.edu/stat/data/Pearson.csv"

> pearson = read.csv(url)

Plot the data

> plot(Son ~ Father, pearson)

> plot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

Add a linear fitted line to the plot.

> abline(lm(Son ~ Father, data = pearson), col = "Blue", lwd = 2)

Add locally weighted scatterplot smoothing line (lowess)

> lines(lowess(pearson$Son, pearson$Father), col = "Darkred", lwd = 2)

More advanced scatterplot using car package

> require(car) #Companion to Applied Regression

> scatterplot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

For more practice, plot scatter plot for built in cars data.

