12 Plotting Variables in R
Plot Single Discrete / Qualitative Variables
You can plot discrete or qualitative variables using the following techniques
barplot
pie (though, it is not a good charting method)
We are going to diamonds dataset in ggplot2 package for illustration purpose
require(ggplot2)
data(diamonds)
Summarize data ... find frequency for each color of diamond
diamond.colors = table(diamonds$color)
Simple barplot
barplot(diamond.colors)
Order the barplot
diamond.colors = diamond.colors[order(diamond.colors, decreasing = TRUE)]
barplot(diamond.colors)
Create a palette of 7 colors from RColorBrewer.
For more info about RColorBrewer look at this page http://blog.einext.com/r-1/working-with-colors-in-r
require(RColorBrewer)
blues = brewer.pal(7, "Blues")
Use the color palette to the barplot. rev function reverses the color palette values
barplot(diamond.colors, col = rev(blues))
Tidy up the graph a little bit
Set the plot parameters
par(ama = c(1, 1, 1, 1)) # ama: outside margin
par(mar = c(4, 5, 2, 1)) # mar: margin
barplot(diamond.colors,
col = rev(blues), # Color of the bars
horiz = TRUE, # Putting the label values horizontally
las = 1, # Orientation of x-labels
border = NA, # No borders on bars
main = "Frequencies of Different Colors of Diamond", # title of the graph
xlab = "Number of observations", # label of chart along x axis ylab = "Color of Diamond" # label of chart along y axis)
Display Categorical Variable using Pie Chart
Not Recommended, rather use barchart, see why below)
pie(diamond.colors, col = blues)
It is hard to tell the relative measures from pie chart, while the bar chart clearly shows the difference. Below is a text from the help text on pie function.
Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.
Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.
Plot Quantitative or Continuous Variables
You can plot continuous variables or quantitative variables using the following
histogram
boxplot
Histogram
prices = diamonds$price
hist(prices, col = "orange")
Specify number of bucket you want to create across x axis (... that contains the values of the continuous variable)
hist(prices, col = "orange", breaks = 100)
Plot density or relative frequency
hist(prices, col = "orange", breaks = 100, freq = FALSE)
Add a normal distribution curve to the histogram
curve(dnorm(x, mean = mean(prices), sd = sd(prices)), col = "darkblue", lwd = 2, add = TRUE)
Boxplot
Boxplot is useful to outliers and symmetry in the distribution. For this illustration, let's use iris dataset that comes with R.
data(iris)
str(iris)
Take a subset iris dataset
virginica = iris[iris$Species == "virginica", ]
Simple boxplot
boxplot(virginica$Sepal.Length)
Putting together boxplot, histogram and normal curve on same plot
> carats = diamonds$carat
hist(carats, col = "Lightgrey", breaks = 100, freq = FALSE)
> boxplot(carats, col = "orange", horizontal = TRUE, add = TRUE)
> curve(dnorm(x, mean = mean(carats), sd = sd(carats)), col = "darkblue", lwd = 2, add = TRUE)