03 Slice and Dice in R

For illustration we will use ggplot2::diamonds data set. (ggplot2 here is the name of package)

> diamonds = ggplot2::diamonds
> head(diamonds)
  carat       cut color clarity depth table price    x    y    z
1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48

Find mean carat for all observations

> mean(diamonds$carat)

Find mean carat for premium diamonds

> mean(diamonds$carat[diamonds$cut == "Premium"])

Here is another way

> mean(subset(diamonds, cut == "Premium")$carat)

Create another data frame with price and cut columns

> diamonds.premium = diamonds[diamonds$cut == "Premium", c("carat", "price")]
> str(diamonds.premium)

Verify the row count by using table function on cut column in diamonds dataset.

> table(diamonds$cut)

Find median carat for Premium or Idea cut diamonds.

> median(diamonds$carat[diamonds$cut == "Premium" | diamonds$cut == "Ideal"])

Take 10 sample records from the dataset

> diamonds[sample(nrow(diamonds), 10), ]

Select records in diff combinations

diamonds[, c("price")] #returns a vector
diamonds[, c("price"), drop = FALSE] #returns a single column dataframe
diamonds[, 2] #returns a vector
diamonds[, 2, drop = FALSE] #returns a single column dataframe
diamonds[, c("price", "cut")] # returns all rows and Price and Cut columns
diamonds[1, c("price", "cut")] # return row 1 with Price and Cut columns, 1 being the first row
diamonds[1:4, c("price", "cut")] #returns rows from 1 to 4 with Price and Cut columns
diamonds[1:4, 1:3] # returns rows from 1 to 4 with columns from 1 to 3, 1 being the first column
diamonds[1:4, c(1, 2, 4)] #returns rows from 1 to 4 and column 1, 2, and 4
diamonds[c(1, 2, 4), c(4, 1)] # returns rows 1, 2, 4 and column 4 and 1
diamonds[4, 2] # returns cell value from row 4 and column 2
diamonds[["cut"]] # returns column dropping the column name 

Apply functions in R

Plyr Package Equivalent

Base function   Input   Output   plyr function  --------------------------------------- aggregate        d       d       ddply + colwise  apply            a       a/l     aaply / alply  by               d       l       dlply  lapply           l       l       llply   mapply           a       a/l     maply / mlply  replicate        r       a/l     raply / rlply  sapply           l       a       laply 

http://had.co.nz/plyr/