03 Slice and Dice in R
For illustration we will use ggplot2::diamonds data set. (ggplot2 here is the name of package)
> diamonds = ggplot2::diamonds
> head(diamonds)
carat cut color clarity depth table price x y z
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Find mean carat for all observations
> mean(diamonds$carat)
Find mean carat for premium diamonds
> mean(diamonds$carat[diamonds$cut == "Premium"])
Here is another way
> mean(subset(diamonds, cut == "Premium")$carat)
Create another data frame with price and cut columns
> diamonds.premium = diamonds[diamonds$cut == "Premium", c("carat", "price")]
> str(diamonds.premium)
Verify the row count by using table function on cut column in diamonds dataset.
> table(diamonds$cut)
Find median carat for Premium or Idea cut diamonds.
> median(diamonds$carat[diamonds$cut == "Premium" | diamonds$cut == "Ideal"])
Take 10 sample records from the dataset
> diamonds[sample(nrow(diamonds), 10), ]
Select records in diff combinations
diamonds[, c("price")] #returns a vector
diamonds[, c("price"), drop = FALSE] #returns a single column dataframe
diamonds[, 2] #returns a vector
diamonds[, 2, drop = FALSE] #returns a single column dataframe
diamonds[, c("price", "cut")] # returns all rows and Price and Cut columns
diamonds[1, c("price", "cut")] # return row 1 with Price and Cut columns, 1 being the first row
diamonds[1:4, c("price", "cut")] #returns rows from 1 to 4 with Price and Cut columns
diamonds[1:4, 1:3] # returns rows from 1 to 4 with columns from 1 to 3, 1 being the first column
diamonds[1:4, c(1, 2, 4)] #returns rows from 1 to 4 and column 1, 2, and 4
diamonds[c(1, 2, 4), c(4, 1)] # returns rows 1, 2, 4 and column 4 and 1
diamonds[4, 2] # returns cell value from row 4 and column 2
diamonds[["cut"]] # returns column dropping the column name
Apply functions in R
Plyr Package Equivalent
Base function Input Output plyr function --------------------------------------- aggregate d d ddply + colwise apply a a/l aaply / alply by d l dlply lapply l l llply mapply a a/l maply / mlply replicate r a/l raply / rlply sapply l a laply
http://had.co.nz/plyr/