Summarizing Data
Before we do any Data Analysis, need to see if data is good
Why?
- Data too big to look at
- Need to find problems before analyzing
Problems:
- Missing values
- Values outside of expected ranges
- Values that seem to be in the wrong units
- Mislabeled variables/columns
- Variables that are the wrong class
Summarizing Data in R
summary(x)
- summarizes all quantitative and qualitative variablesquantile(x)
- range of variables
sapply(x[1, ], class)
- calls
class
for every element of the 1st row - tells if data was loaded properly
names(x)
- columns’ names
Sizes:
dim(x)
- size of the dataset- same as
nrow(x)
andncol(x)
- length(x) and unique(x)
tables
table(x)
- unique + countertable(x, y)
- two-dimensional table
logical tests
- any(x > 10) - are there any TRUEs?
- all(x > 10) - are all trues?
- which(x > 10) - which elements are TRUEs?
- which(is.na(x)) - which are NAs
-
use |
not,&
and,|
or:- which(| is.na(x) & x > 10)
- sum(is.na(x))
- how many NAs
summarizing by columns or rows
rowSums
,rowMeans
colSums
,colMeans