Summarizing Data
Before we do any Data Analysis, need to see if data is good
Why?
- Data too big to look at
- Need to find problems before analyzing
Problems:
- Missing values
- Values outside of expected ranges
- Values that seem to be in the wrong units
- Mislabeled variables/columns
- Variables that are the wrong class
Summarizing Data in R
summary(x)- summarizes all quantitative and qualitative variablesquantile(x)- range of variables
sapply(x[1, ], class)
- calls
classfor every element of the 1st row - tells if data was loaded properly
names(x)
- columns’ names
Sizes:
dim(x)- size of the dataset- same as
nrow(x)andncol(x) - length(x) and unique(x)
tables
table(x)- unique + countertable(x, y)- two-dimensional table
logical tests
- any(x > 10) - are there any TRUEs?
- all(x > 10) - are all trues?
- which(x > 10) - which elements are TRUEs?
- which(is.na(x)) - which are NAs
-
use |not,&and,|or:- which(| is.na(x) & x > 10)- sum(is.na(x))- how many NAs
summarizing by columns or rows
rowSums,rowMeanscolSums,colMeans