13 Association Plots in R
Bivariate Analysis - single numeric outcome against categorical dependent variable
For illustration, we will use ggplot2::diamonds dataset.
> diamonds = ggplot2::diamonds
> price.means = aggregate(price ~ cut, diamonds, mean)
> head(price.means)
Plot mean price
> barplot(price.means$price, names.arg = price.means$cut)
Let's make the visual better
It is a good idea to sort the data before plotting
Add title
Add x and y labels
Add a color to the bars
> price.means = price.means[order(price.means$price, decreasing = TRUE),]
> barplot(price.means$price,
names.arg = price.means$cut,
col = "Steelblue",
xlab = "Cut",
ylab = "Avg Price",
main = "Association plot between Cut and Average Price",
border = "white")
Distribution of Outcome (e.g. Price) against Common Categorical Variable
We are going to use grouped boxplot. For illustration, we will use built-in mtcars dataset.
> data(mtcars)
> boxplot(mpg ~ cyl, data = mtcars)
Let's modify this plot
Add x and y labels
Add Title
Add color to each group to differentiate
> boxplot(mpg ~ cyl,
data = mtcars,
col = brewer.pal(3, "Paired"),
xlab = "No of Cylinder",
ylab = "Mileage",
main = "Mileage by Number of Cylinders \n mtcars dataset",
outpch = 16,
outcol = brewer.pal(3, "Paired"),
staplelty = 0,
whisklty = 1,
#name = c("", "")
)
For more practice use MASS::painters dataset and plot Expression ~ School
Scatter Plot
Loading package for loading data from web.
> require(RCurl)
Load Pearson's Height dataset
> url = "http://www.math.uah.edu/stat/data/Pearson.csv"
> pearson = read.csv(url)
Plot the data
> plot(Son ~ Father, pearson)
> plot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")
Add a linear fitted line to the plot.
> abline(lm(Son ~ Father, data = pearson), col = "Blue", lwd = 2)
Add locally weighted scatterplot smoothing line (lowess)
> lines(lowess(pearson$Son, pearson$Father), col = "Darkred", lwd = 2)
More advanced scatterplot using car package
> require(car) #Companion to Applied Regression
> scatterplot(Son ~ Father, pearson, pch = 16, col = "Darkgrey", main = "Father's Height vs Son's Height \n Pearson Dataset", xlab = "Father's Height", ylab = "Son's Height")
For more practice, plot scatter plot for built in cars data.
Among the numeric columns you can view all correlations together
require(GGally)
data(diamonds)
ggcorr(diamonds)
Observation correlations on entire dataset
require(psych)
data("iris")
pairs.panels(iris)