# 03 Slice and Dice in R

For illustration we will use ggplot2::diamonds data set. (ggplot2 here is the name of package)

> diamonds = ggplot2::diamonds

carat cut color clarity depth table price x y z

1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43

2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31

3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31

4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63

5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75

6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Find mean carat for all observations

> mean(diamonds\$carat)

Find mean carat for premium diamonds

Here is another way

Create another data frame with price and cut columns

Verify the row count by using table function on cut column in diamonds dataset.

> table(diamonds\$cut)

Find median carat for Premium or Idea cut diamonds.

> median(diamonds\$carat[diamonds\$cut == "Premium" | diamonds\$cut == "Ideal"])

Take 10 sample records from the dataset

> diamonds[sample(nrow(diamonds), 10), ]

Select records in diff combinations

diamonds[, c("price")] #returns a vector

diamonds[, c("price"), drop = FALSE] #returns a single column dataframe

diamonds[, 2] #returns a vector

diamonds[, 2, drop = FALSE] #returns a single column dataframe

diamonds[, c("price", "cut")] # returns all rows and Price and Cut columns

diamonds[1, c("price", "cut")] # return row 1 with Price and Cut columns, 1 being the first row

diamonds[1:4, c("price", "cut")] #returns rows from 1 to 4 with Price and Cut columns

diamonds[1:4, 1:3] # returns rows from 1 to 4 with columns from 1 to 3, 1 being the first column

diamonds[1:4, c(1, 2, 4)] #returns rows from 1 to 4 and column 1, 2, and 4

diamonds[c(1, 2, 4), c(4, 1)] # returns rows 1, 2, 4 and column 4 and 1

diamonds[4, 2] # returns cell value from row 4 and column 2

diamonds[["cut"]] # returns column dropping the column name

## Apply functions in R

Plyr Package Equivalent

Base function Input Output plyr function --------------------------------------- aggregate d d ddply + colwise apply a a/l aaply / alply by d l dlply lapply l l llply mapply a a/l maply / mlply replicate r a/l raply / rlply sapply l a laply