# 13 Association Plots in R

## Bivariate Analysis - single numeric outcome against categorical dependent variable

For illustration, we will use ggplot2::diamonds dataset.

> diamonds = ggplot2::diamonds

> price.means = aggregate(price ~ cut, diamonds, mean)

Plot mean price

> barplot(price.means\$price, names.arg = price.means\$cut)

Let's make the visual better

• It is a good idea to sort the data before plotting

• Add x and y labels

• Add a color to the bars

> price.means = price.means[order(price.means\$price, decreasing = TRUE),]

> barplot(price.means\$price,

names.arg = price.means\$cut,

col = "Steelblue",

xlab = "Cut",

ylab = "Avg Price",

main = "Association plot between Cut and Average Price",

border = "white")

## Distribution of Outcome (e.g. Price) against Common Categorical Variable

We are going to use grouped boxplot. For illustration, we will use built-in mtcars dataset.

> data(mtcars)

> boxplot(mpg ~ cyl, data = mtcars)

Let's modify this plot

1. Add x and y labels

3. Add color to each group to differentiate

> boxplot(mpg ~ cyl,

data = mtcars,

col = brewer.pal(3, "Paired"),

xlab = "No of Cylinder",

ylab = "Mileage",

main = "Mileage by Number of Cylinders \n mtcars dataset",

outpch = 16,

outcol = brewer.pal(3, "Paired"),

staplelty = 0,

whisklty = 1,

#name = c("", "")

)

For more practice use MASS::painters dataset and plot Expression ~ School

## Scatter Plot

> require(RCurl)

> url = "http://www.math.uah.edu/stat/data/Pearson.csv"

Plot the data

> plot(Son ~ Father, pearson)

> plot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

Add a linear fitted line to the plot.

> abline(lm(Son ~ Father, data = pearson), col = "Blue", lwd = 2)

Add locally weighted scatterplot smoothing line (lowess)

> lines(lowess(pearson\$Son, pearson\$Father), col = "Darkred", lwd = 2)

More advanced scatterplot using car package

> require(car) #Companion to Applied Regression

> scatterplot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

For more practice, plot scatter plot for built in cars data.

Among the numeric columns you can view all correlations together

require(GGally)

data(diamonds)

ggcorr(diamonds)

Observation correlations on entire dataset

require(psych)

data("iris")

pairs.panels(iris)