Data Scientist Interview Question - Amazon

A common interview question asked for the role of data scientist is on different correlation measures and their application.

Question: Explain the correlation measures between different types of variables - between continuous, nominal, categorical, and ordinal variables.

Correlation Measures Between Different Types of Variables

Correlation measures the relationship between two variables. The appropriate correlation method depends on the type of variables:

Continuous vs. Continuous
Continuous vs. Categorical (Nominal or Ordinal)
Categorical vs. Categorical (Nominal or Ordinal)

1. Continuous vs. Continuous Variables

Use: Pearson, Spearman, Kendall

Method	When to Use	Assumptions
Pearson Correlation (r)	When both variables are normally distributed and have a linear relationship	Assumes linearity and normality
Spearman’s Rank Correlation (ρ)	When data is not normally distributed or has a monotonic but not necessarily linear relationship	Assumes monotonicity but not normality
Kendall’s Tau (τ)	When you have small datasets or many tied ranks	More robust for small samples, assumes monotonicity

Example

Pearson: Relationship between height and weight
Spearman: Relationship between income and happiness (not linear)
Kendall: Small sample ranking of customer satisfaction and product rating

2. Continuous vs. Categorical Variables

Use: ANOVA, t-test, Point-Biserial, Rank-Biserial

Method	When to Use	Assumptions
t-test	Comparing a continuous variable between two groups (binary categorical variable)	Normality in continuous variable
ANOVA (Analysis of Variance)	Comparing a continuous variable across multiple categorical groups	Normality & equal variance in groups
Point-Biserial Correlation	When a continuous variable is correlated with a binary categorical variable (e.g., Male/Female vs. Salary)	Similar to Pearson but for binary categories
Rank-Biserial Correlation	When the continuous variable is non-normal	Similar to Spearman but for binary categories

Example

t-test: Comparing salaries between males and females
ANOVA: Comparing test scores between different education levels
Point-Biserial: Relationship between gender (0/1) and height
Rank-Biserial: Relationship between pass/fail (0/1) and test scores

3. Categorical vs. Categorical Variables

Use: Chi-Square, Cramér’s V, Phi Coefficient, Contingency Coefficient

Method	When to Use	Assumptions
Chi-Square Test	When both variables are nominal (unordered categories)	Assumes expected frequencies > 5 in most cells
Cramér’s V	When measuring strength of association in a contingency table	Used for >2x2 tables, ranges from 0 to 1
Phi Coefficient (φ)	When both variables are binary (2x2 contingency table)	Similar to Pearson but for binary data
Contingency Coefficient (C)	When both variables are categorical, adjusting for table size	Values range from 0 to a theoretical max

Example

Chi-Square: Relationship between education level and voting preference
Cramér’s V: Relationship between job role and department
Phi Coefficient: Relationship between smoking (yes/no) and lung disease (yes/no)
Contingency Coefficient: Relationship between marital status and region

4. Ordinal vs. Ordinal Variables

Use: Spearman’s Rank, Kendall’s Tau, Gamma, Somers’ D

Method	When to Use	Assumptions
Spearman’s Rank Correlation (ρ)	When both variables are ordinal (ranked)	Monotonic relationship, no normality assumption
Kendall’s Tau (τ)	When both variables are ordinal with small datasets	Works well with tied ranks
Goodman-Kruskal Gamma	When both variables are ordinal and interested in directional association	No assumption of linearity
Somers' D	Similar to Gamma but accounts for asymmetric dependency	Used when one variable depends on the other

Example

Spearman/Kendall: Relationship between satisfaction rating (1–5) and performance rating (1–5)
Gamma: Relationship between job experience level (junior, mid, senior) and likelihood of promotion
Somers' D: Relationship between education level and income level

5. Continuous vs. Ordinal Variables

Use: Spearman’s Rank, Kendall’s Tau, Point-Biserial

Method	When to Use	Assumptions
Spearman’s Rank Correlation	When continuous and ordinal data have a monotonic relationship	No normality assumption
Kendall’s Tau	When dataset is small or has tied ranks	Monotonicity assumption
Point-Biserial Correlation	If ordinal variable is dichotomous (binary)	Assumes normality in continuous variable

Example

Spearman/Kendall: Relationship between employee experience (years) and performance rating (1–5)
Point-Biserial: Relationship between exam score and pass/fail status

Dataloopr