12 Plotting Variables in R

Plot Single Discrete / Qualitative Variables

You can plot discrete or qualitative variables using the following techniques

  • barplot

  • pie (though, it is not a good charting method)

We are going to diamonds dataset in ggplot2 package for illustration purpose

require(ggplot2)

data(diamonds)

Summarize data ... find frequency for each color of diamond

diamond.colors = table(diamonds$color)

Simple barplot

barplot(diamond.colors)

Order the barplot

diamond.colors = diamond.colors[order(diamond.colors, decreasing = TRUE)]

barplot(diamond.colors)

Create a palette of 7 colors from RColorBrewer.

For more info about RColorBrewer look at this page http://blog.einext.com/r-1/working-with-colors-in-r

require(RColorBrewer)

blues = brewer.pal(7, "Blues")

Use the color palette to the barplot. rev function reverses the color palette values

barplot(diamond.colors, col = rev(blues))

Tidy up the graph a little bit

Set the plot parameters

par(ama = c(1, 1, 1, 1)) # ama: outside margin

par(mar = c(4, 5, 2, 1)) # mar: margin

barplot(diamond.colors,

col = rev(blues), # Color of the bars

horiz = TRUE, # Putting the label values horizontally

las = 1, # Orientation of x-labels

border = NA, # No borders on bars

main = "Frequencies of Different Colors of Diamond", # title of the graph

xlab = "Number of observations", # label of chart along x axis ylab = "Color of Diamond" # label of chart along y axis)

Display Categorical Variable using Pie Chart

  • Not Recommended, rather use barchart, see why below)

pie(diamond.colors, col = blues)

It is hard to tell the relative measures from pie chart, while the bar chart clearly shows the difference. Below is a text from the help text on pie function.

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

Plot Quantitative or Continuous Variables

You can plot continuous variables or quantitative variables using the following

  • histogram

  • boxplot

Histogram

prices = diamonds$price

hist(prices, col = "orange")

Specify number of bucket you want to create across x axis (... that contains the values of the continuous variable)

hist(prices, col = "orange", breaks = 100)

Plot density or relative frequency

hist(prices, col = "orange", breaks = 100, freq = FALSE)

Add a normal distribution curve to the histogram

curve(dnorm(x, mean = mean(prices), sd = sd(prices)), col = "darkblue", lwd = 2, add = TRUE)

Boxplot

Boxplot is useful to outliers and symmetry in the distribution. For this illustration, let's use iris dataset that comes with R.

data(iris)

str(iris)

Take a subset iris dataset

virginica = iris[iris$Species == "virginica", ]

Simple boxplot

boxplot(virginica$Sepal.Length)

Putting together boxplot, histogram and normal curve on same plot

> carats = diamonds$carat

hist(carats, col = "Lightgrey", breaks = 100, freq = FALSE)

> boxplot(carats, col = "orange", horizontal = TRUE, add = TRUE)

> curve(dnorm(x, mean = mean(carats), sd = sd(carats)), col = "darkblue", lwd = 2, add = TRUE)