Bivariate Analysis
Analyzes relationships between two variables
Recall that there are the following Types of Variables
- Categorical Variables - values that can be organized into categories (not numerical)
- Quantitative Variables - with numerical values for which arithmetic operation make sense
So there can be the following combinations:
- Quantitative vs Quantitative
- Quantitative vs Categorical
- Categorical vs Categorical
Independence
typically most interesting question is:
- “Are these variables independent”?
- if they are dependent and correlated, then one variable can be redundant
- and can be removed
Quantitative vs Quantitative
If two variables are numeric:
- plot a Scatter Plot
- try to fit a regression line
- and find Correlation between them
- or Discretize one of them and do #Quantitative vs Categorical analysis
Quantitative vs Categorical
If one is numeric, and another is categorical:
- Visualize one variable w.r.t. another
- typically group values of numerical variable by the values of categorical
- Box Plot#Bivariate Analysis
- Bar Chart#Bivariate Analysis
- Histogram#Bivariate Analysis
- Density Plot#Bivariate Analysis
- Can do One-Way ANOVA F-Test to see if there is any dependence between the variables
Categorical vs Categorical
To compare two categorical variables
- start from building a Contingency Table to show relative frequencies of values
- '’Marginal distribution’’ - distribution of only one of the variables in a contingency table
- '’Conditional Distribution’’ - distribution within a fixed value of a second variable
- so it’s simple to see if there’s any correlation between the two variables just using this matrix
- run some Tests of Independence:
Links
- http://en.wikipedia.org/wiki/Bivariate_analysis
- Introduction to Bivariate Analysis (slides)