Types of Variables
When we have a table with data, rows correspond to ‘‘observation units’’ (subjects, etc.) and columns are ‘‘variables’’.
- NB: Don’t confuse with Random Variables from Probability Theory
There are several types of variables:
- Categorical Variables - values that can be organized into categories (not numerical)
- Quantitative Variables - with numerical values for which arithmetic operation make sense
- '’Ordinal Variables’’ - with natural order
Problems with Variables
Also we may have
- Outliers - too large or too small values, sometimes they are errors, we have to find explanation for them
- use Anomaly Detection techniques to detect outliers
- '’Missing values’’ - not present values, can bias the result
- need Handling Missing Values to avoid that
- Noise - modification of the original value
- Looks like normal input, but it’s faulty
- Very hard to detect
Relationships
Types of variables in the analysis:
- outcome - the variables of our interest
- explanatory - the variables that are used to analyze and explain the outcome
Types of Relationships
The relationships between the explanatory variable and the outcome
- '’independent’’: there is no association between the variables
- '’association’’: the variables are dependent, but it’s not clear what kind of relationship there is
- '’causes’’: changes in the explanatory variables case the outcome to change
- '’reverse causation’’: changes in outcome cause the explanatory variable to change
- '’coincidence’’: just pure chance
- '’common cause’’: some other variable causes both the explanatory variables and the outcome to change - see Lurking Variables and Confounding Variables
Multivariate Analysis
To analyze relationships between variables there are following methods:
- Bivariate Analysis
- e.g. Correlation, Regression Analysis, ANOVA, Statistical Test of Independence
- and many others