03 Slice and Dice in R

For illustration we will use ggplot2::diamonds data set. (ggplot2 here is the name of package)

> diamonds = ggplot2::diamonds

> head(diamonds)

carat cut color clarity depth table price x y z

1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43

2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31

3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31

4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63

5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75

6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Find mean carat for all observations

> mean(diamonds$carat)

Find mean carat for premium diamonds

> mean(diamonds$carat[diamonds$cut == "Premium"])

Here is another way

> mean(subset(diamonds, cut == "Premium")$carat)

Create another data frame with price and cut columns

> diamonds.premium = diamonds[diamonds$cut == "Premium", c("carat", "price")]

> str(diamonds.premium)

Verify the row count by using table function on cut column in diamonds dataset.

> table(diamonds$cut)

Find median carat for Premium or Idea cut diamonds.

> median(diamonds$carat[diamonds$cut == "Premium" | diamonds$cut == "Ideal"])

Take 10 sample records from the dataset

> diamonds[sample(nrow(diamonds), 10), ]

Select records in diff combinations

diamonds[, c("price")] #returns a vector

diamonds[, c("price"), drop = FALSE] #returns a single column dataframe

diamonds[, 2] #returns a vector

diamonds[, 2, drop = FALSE] #returns a single column dataframe

diamonds[, c("price", "cut")] # returns all rows and Price and Cut columns

diamonds[1, c("price", "cut")] # return row 1 with Price and Cut columns, 1 being the first row

diamonds[1:4, c("price", "cut")] #returns rows from 1 to 4 with Price and Cut columns

diamonds[1:4, 1:3] # returns rows from 1 to 4 with columns from 1 to 3, 1 being the first column

diamonds[1:4, c(1, 2, 4)] #returns rows from 1 to 4 and column 1, 2, and 4

diamonds[c(1, 2, 4), c(4, 1)] # returns rows 1, 2, 4 and column 4 and 1

diamonds[4, 2] # returns cell value from row 4 and column 2

diamonds[["cut"]] # returns column dropping the column name

Apply functions in R

Plyr Package Equivalent

Base function Input Output plyr function --------------------------------------- aggregate d d ddply + colwise apply a a/l aaply / alply by d l dlply lapply l l llply mapply a a/l maply / mlply replicate r a/l raply / rlply sapply l a laply

http://had.co.nz/plyr/