
Careem Interview Questions for Data Scientist
Careem (a subsidiary of Uber) is one of the most sought-after companies to work for in the Middle East. Careem leverages vast amounts of data for ride-hailing, food delivery, and digital payments across the MENA region, making it an ideal place for aspiring data scientists to solve real-world, large-scale problems.
I recently went through their Data Scientist interview process, and in this post, I’ll share my experience in detail - including the kinds of questions I faced, how I approached them, and some tips for preparation.
The interview process was divided into coding, probability/statistics, machine learning fundamentals, evaluation metrics, and case studies. Here’s a breakdown.

1. Leetcode-Style Coding Question (Medium Level)
The first round involved a coding test on arrays and hashing, similar to what you’d find on Leetcode.
Question:
Given an integer array, return the length of the longest subarray whose sum is divisible by k.
Example:
-
Input:
arr = [2, 3, 5, 1, 9],k = 5 -
Output:
4(subarray[3, 5, 1, 9]has sum = 18, divisible by 5)
Approach:
-
Use prefix sums and a hashmap to store remainders.
-
If the same remainder is seen again, it means the subarray sum between indices is divisible by
k.
Solution (Python):
def longest_subarray_divisible_by_k(arr, k):
prefix_sum = 0
remainder_map = {0: -1}
max_len = 0
for i, num in enumerate(arr):
prefix_sum += num
remainder = prefix_sum % k
if remainder in remainder_map:
max_len = max(max_len, i - remainder_map[remainder])
else:
remainder_map[remainder] = i
return max_len
print(longest_subarray_divisible_by_k([2, 3, 5, 1, 9], 5))
Answer: 4
This tested my problem-solving speed, understanding of hashmaps, and ability to write clean code under time pressure.
You can see more practice problems in Python for data interviews here: Python Practice Problems for Data Interviews

2. Coin Toss Probability
Next, I was asked a classic probability puzzle.
Question:
Suppose you toss a fair coin until you get 2 consecutive heads. What is the expected number of tosses?
Approach:
-
Let
E= expected number of tosses. -
Consider cases:
-
First toss = Tail → 1 toss +
E(since we restart). -
First toss = Head → with probability 0.5, next toss = Head → 2 tosses. Otherwise, next toss = Tail → 2 tosses +
E.
-
Calculation:
E = 0.5 * (1 + E) + 0.25 * 2 + 0.25 * (2 + E)
Simplifying → E = 6
Answer: On average, it takes 6 tosses to get 2 consecutive heads.
This checks probability fundamentals and ability to set up recursive expectation problems.
For a refresher on probability fundamentals, check out: Real-Life Examples of Probability: 25+ Scenarios Explained With Math & Python
For basics of Markov Chains in probability, see this: Top Markov Chain Interview Questions and Answers for Data Science & Analytics

3. Confusion Matrix
They moved into classification evaluation questions.
Question:
Explain a confusion matrix with an example.
Answer:
A confusion matrix is a table that compares predicted vs. actual values in a classification model.
For a binary classifier predicting fraudulent transactions:
| Predicted Fraud | Predicted Not Fraud | |
|---|---|---|
| Actual Fraud | True Positive (TP) | False Negative (FN) |
| Actual Not Fraud | False Positive (FP) | True Negative (TN) |
-
Accuracy = (TP+TN) / Total
-
Precision = TP / (TP+FP)
-
Recall = TP / (TP+FN)
-
F1-score = 2 * (Precision*Recall) / (Precision+Recall)
The interviewer then asked:
“If you were building a fraud detection system, would you prioritize precision or recall?”
Answer:
I would prioritize recall, because missing fraudulent cases (false negatives) is more costly than flagging some legitimate transactions (false positives).
To understand these metrics or refresh your memory in detail, see this article: Sensitivity vs Precision in Machine Learning: Key Differences Explained

4. How Will You Evaluate the Model’s Performance?
Question:
If you train a binary classification model for churn prediction, how will you evaluate its performance?
Answer:
-
Start with confusion matrix metrics: precision, recall, F1.
-
For imbalanced datasets, accuracy is misleading → use ROC-AUC and PR-AUC.
-
Use business KPIs: for churn, recall (catching at-risk customers) matters more.
-
Perform cross-validation to ensure generalizability.
-
Use lift/gain charts to show business value in terms of how many churners can be identified in the top deciles.
5. Fundamental Questions on Statistics & Machine Learning
The interviewer tested basic knowledge with rapid-fire questions.
Q1: What is the difference between variance and bias in ML?
-
Bias = error from oversimplifying the model (underfitting).
-
Variance = error from too much sensitivity to training data (overfitting).
-
Goal: Find the bias-variance tradeoff.
Q2: What is regularization?
Regularization adds a penalty term to prevent overfitting. Examples:
-
L1 (Lasso): shrinks coefficients to zero (feature selection).
-
L2 (Ridge): shrinks coefficients but keeps them small.
Q3: What metrics would you use for regression problems?
-
MSE, RMSE, MAE for error.
-
R² for goodness of fit.
-
Business-specific metrics depending on context.
If you want to go over some of the fundamental questions commonly asked, see this article: Common Data Science Interview Questions
To learn the fundamentals of probability distribution and hands on python examples, see this article: Probability Distributions Explained: Intuition, Math, and Python Examples (Complete Guide)

6. Case Study on a Practical Company Issue
Scenario:
Careem wants to reduce driver cancellations. You are given a dataset with features like:
-
Driver ID
-
Ride request time
-
Pickup location
-
Distance to customer
-
Customer rating
-
Driver acceptance history
Question: How would you approach this problem?
Answer:
-
Understand the problem: Cancellations increase wait time and reduce customer satisfaction.
-
EDA: Check patterns → are cancellations higher at rush hour? For longer distances? With low-rated customers?
-
Feature engineering:
-
Driver availability at request time.
-
Distance between driver and customer.
-
Customer’s cancellation history.
-
-
Modeling:
-
Classification model (cancel vs. not cancel).
-
Algorithms: Logistic regression, Random Forest, XGBoost.
-
-
Evaluation:
-
Use recall to minimize missed high-risk cancellations.
-
Track business metric: cancellation rate reduced by X%.
-
-
Recommendation:
-
Incentives for drivers accepting long trips.
-
Improve driver-customer matching algorithm.
-
This was a business-oriented question, testing if I could link ML with practical impact.
Take a look at the alternative of A/B testing, often used by Uber: A/B Testing Alternative - Switchback Design
Understand the nuances of loss function with this article: Cross Entropy vs MSE: Choosing the Right Loss Function in ML
Take a look at this Logistic Regression question which puzzles many candidates: Data Scientist Interview - Netflix

7. Basic Data Science Questions on Algorithms
They also tested fundamentals of common algorithms.
Q1: What are CARTs (Classification and Regression Trees)?
-
Decision tree-based algorithms splitting data on feature thresholds.
-
Simple but prone to overfitting.
Q2: Difference between Random Forest and Boosting?
-
Random Forest: builds multiple decision trees in parallel, averages results (reduces variance).
-
Boosting: builds trees sequentially, each focusing on errors of the previous (reduces bias).
Q3: What is clustering, and when would you use it?
-
Clustering = unsupervised grouping of data (e.g., K-means, DBSCAN).
-
Used for customer segmentation, anomaly detection, etc.
Q4: Basics of Time Series Analysis?
-
Stationarity, seasonality, autocorrelation.
-
ARIMA, SARIMA for forecasting.
-
Newer methods: LSTMs, Prophet.
Final Thoughts & Tips
The Careem interview tested not only coding and ML fundamentals but also the ability to:
-
Connect technical answers with real-world business impact.
-
Think critically under time pressure.
-
Communicate clearly and structure answers.
Preparation Tips:
-
Practice medium Leetcode problems (arrays, strings, hashing).
-
Revise probability, statistics, ML basics.
-
Be ready for case studies - practice with open datasets.
-
Focus on business alignment - always link metrics back to customer or company goals.

This experience taught me that data science interviews in the Middle East (especially at companies like Careem) balance technical depth with applied problem-solving. If you’re preparing, focus equally on coding, ML, and business case analysis.
Related Articles
- Netflix Interview Question: How to Design a Metric to Compare Rankings of Lists of Shows
- Data Scientist Interview Question - Amazon
- AI Interview Questions with Solutions for Beginners
- Top 5 Platforms to Learn Data Science and Prepare for Interviews
- Common Data Science Interview Questions: Guide for Data Scientists, Analysts, Quants, and ML/AI Engineers
