13 Association Plots in R

Bivariate Analysis - single numeric outcome against categorical dependent variable

For illustration, we will use ggplot2::diamonds dataset.

> diamonds = ggplot2::diamonds
> price.means = aggregate(price ~ cut, diamonds, mean)
> head(price.means)

Plot mean price

> barplot(price.means$price, names.arg = price.means$cut)

Let's make the visual better

  • It is a good idea to sort the data before plotting
  • Add title
  • Add x and y labels
  • Add a color to the bars
> price.means = price.means[order(price.means$price, decreasing = TRUE),]
> barplot(price.means$price, 
    names.arg = price.means$cut, 
    col = "Steelblue", 
    xlab = "Cut", 
    ylab = "Avg Price", 
    main = "Association plot between Cut and Average Price",
    border = "white")

Distribution of Outcome (e.g. Price) against Common Categorical Variable

We are going to use grouped boxplot. For illustration, we will use built-in mtcars dataset.

> data(mtcars)
> boxplot(mpg ~ cyl, data = mtcars)

Let's modify this plot

  1. Add x and y labels
  2. Add Title
  3. Add color to each group to differentiate
> boxplot(mpg ~ cyl, 
        data = mtcars, 
        col = brewer.pal(3, "Paired"),
        xlab = "No of Cylinder",
        ylab = "Mileage",
        main = "Mileage by Number of Cylinders \n mtcars dataset",
        outpch = 16,
        outcol = brewer.pal(3, "Paired"),
        staplelty = 0,
        whisklty = 1,
        #name = c("", "")
        )

For more practice use MASS::painters dataset and plot Expression ~ School

Scatter Plot

Loading package for loading data from web.

> require(RCurl)

Load Pearson's Height dataset

> url = "http://www.math.uah.edu/stat/data/Pearson.csv"
> pearson = read.csv(url) 

Plot the data

> plot(Son ~ Father, pearson)
> plot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

Add a linear fitted line to the plot.

> abline(lm(Son ~ Father, data = pearson), col = "Blue", lwd = 2)

Add locally weighted scatterplot smoothing line (lowess)

> lines(lowess(pearson$Son, pearson$Father), col = "Darkred", lwd = 2)

More advanced scatterplot using car package

> require(car) #Companion to Applied Regression

> scatterplot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")

For more practice, plot scatter plot for built in cars data.

Among the numeric columns you can view all correlations together

require(GGally)

data(diamonds)

ggcorr(diamonds)

Observation correlations on entire dataset

require(psych)
data("iris")
pairs.panels(iris)