
5 Data Scientist Interview Questions from Google and Meta 2026
In this article, we will explore five interesting data scientist interview questions commonly asked at Google and Meta. We’ll not only provide detailed, step-by-step answers, but also explain all underlying concepts, including statistics, probability, experimentation, and SQL. Whether you're preparing for your next interview or simply want to sharpen your data science skills, this guide is for you.
5 Data Scientist Interview Questions from Google and Meta
1. A/B Test Analysis: Should the New Feature Launch?
Question
A company has developed a new feature and performed an A/B test. Here are the results:
- Comments: +5%
- Likes: -10%
- Timespent: +1%
- All else neutral
How would you decide whether to put the feature into production based on these A/B test results? Any ideas?
Answer & Explanation
This is a classic product sense and experimentation analysis question. Companies like Google and Meta rely heavily on A/B testing to make data-driven decisions. Let’s break down the steps you should follow when evaluating these results.
Step 1: Understand the Metrics
- Comments (+5%): An increase in user engagement via comments, which may indicate deeper interaction.
- Likes (-10%): A decrease in quick, shallow engagement.
- Timespent (+1%): A slight increase in the time users spend on the platform—often a core key performance indicator (KPI).
- “All else neutral”: No significant change in other monitored metrics.
Step 2: Consider Statistical and Practical Significance
- Statistical Significance: Are these changes statistically significant? Before making any decision, check the p-values/confidence intervals for each metric. If the changes might be due to noise, you should not rely on them.
- Practical Significance: Even if the effects are statistically significant, are they big enough to matter for the business? For example, is a 1% increase in time spent meaningful for user retention or revenue?
Step 3: Holistic Product Impact
- Trade-offs: The feature increases comments (potentially higher engagement) but decreases likes (possibly less positive reinforcement). Is this trade-off acceptable?
- Core Metrics: At companies like Meta and Google, time spent is often a north star metric. A 1% increase can be huge at scale, but only if it doesn't come at the cost of user satisfaction or other business goals.
- Unintended Consequences: Why did likes decrease? Could the feature be making it harder to like, or is it changing the type of content seen? Is there a risk of negative user sentiment?
- Context: Consider seasonality, user segments, and external factors that could have impacted the results.
Step 4: Recommendation
Before recommending a full rollout:
- Confirm all metrics are statistically significant.
- Ensure the increases in comments and time spent align with business objectives.
- Investigate the cause of the decrease in likes. Is it a UI issue, or are users less happy?
- Run additional analyses (e.g., user feedback, segmentation by region or user type) to check for unintended negative impacts.
- Consider a phased or limited rollout if unsure.
Step 5: Possible Next Steps
- Run Surveys: Ask users for qualitative feedback about the feature.
- Segment Analysis: Break down metrics by user cohort, geography, device, etc. Maybe the feature is positive for some users and negative for others.
- Iterate on Feature: If likes are crucial, try to tweak the feature to recover this metric.
Conclusion
There is rarely a simple yes/no answer in real-world A/B testing. The key is to weigh all metrics, understand the “why” behind changes, and consider both business goals and user experience.
2. Probability Game: Should You Play the Dice Game?
Question
There’s a game where you are given two fair six-sided dice and asked to roll. If the sum of the values on the dice equals seven, you win $21. However, you must pay $5 to play each time you roll both dice. Do you play this game? As a follow-up: What is the probability of making money from this game?
Answer & Explanation
Step 1: Calculate the Probability of Winning
Two six-sided dice have outcomes from 1 to 6 each. The total number of possible outcomes is:
\( 6 \times 6 = 36 \)
What are the combinations that sum to 7?
- (1,6)
- (2,5)
- (3,4)
- (4,3)
- (5,2)
- (6,1)
There are 6 such combinations.
So, the probability of rolling a sum of 7 is:
\( P(\text{sum}=7) = \frac{6}{36} = \frac{1}{6} \)
Step 2: Calculate the Expected Value
Let’s define the expected value (EV) per game:
- Win: +$21 (probability 1/6)
- Lose: $0 (probability 5/6)
- Cost to play: -$5
\( EV = (\frac{1}{6} \times \$21) + (\frac{5}{6} \times \$0) - \$5 \)
\( EV = \$3.50 - \$5 = -\$1.50 \)
Conclusion: On average, you lose $1.50 per game. Therefore, you should not play this game if your goal is to make money.
Step 3: Probability of Making Money
If you play one game:
- Probability of winning (i.e., make $16, since you win $21 but spent $5): \( \frac{1}{6} \)
- Probability of losing $5: \( \frac{5}{6} \)
If you play n games, the probability of coming out ahead is more complex, as you need at least enough wins to offset your total cost.
For a single play, the probability of making money is:
\( P(\text{profit} > 0) = \frac{1}{6} \)
Generalization
If you play n times, the probability of making money is:
Let \( k \) be the number of wins. The total cost is \( 5n \), and total winnings are \( 21k \).
You make money if \( 21k > 5n \) or \( k > \frac{5n}{21} \).
The distribution of wins is binomial: \( k \sim Binomial(n, \frac{1}{6}) \).
The probability of making money after n games is:
\( P\left(k > \frac{5n}{21}\right) \)
This can be calculated for any n using the binomial cumulative distribution function.
3. Probability: Cards with Colors and Numbers
Question
There are 50 cards of 5 different colors: 10 Red, 10 Blue, 10 Orange, 10 Green, and 10 Yellow. Each color has cards numbered 1 to 10. You pick 2 cards at random. What is the probability that they are not of the same color and not of the same number?
Answer & Explanation
Step 1: Total Number of Ways to Pick 2 Cards
Total number of ways to pick 2 cards from 50:
\( C_{50}^{2} = \frac{50 \times 49}{2} = 1225 \)
Step 2: Number of Favorable Outcomes
We want the number of pairs where the two cards are not the same color and not the same number.
- Total pairs = 1225
- Pairs with same color: For each color (say red), there are 10 cards. Number of ways to pick 2 of same color: \( C_{10}^2 = 45 \). For 5 colors: \( 45 \times 5 = 225 \)
- Pairs with same number: For each number (say 1), there are 5 cards (one of each color). Number of ways to pick 2 of same number: \( C_{5}^2 = 10 \). For 10 numbers: \( 10 \times 10 = 100 \)
- Pairs with same color and same number: Each card is unique, so cannot have both same color and number (would be the same card, which is not allowed).
But, to get the number of pairs that are not same color and not same number, we use the inclusion-exclusion principle:
Number of pairs that are same color or same number:
\( = (\text{same color}) + (\text{same number}) = 225 + 100 = 325 \)
Number of pairs that are not same color or same number:
\( = 1225 - 225 - 100 = 900 \)
But this is not correct; inclusion-exclusion principle requires adding back the intersection (pairs that are both same color and same number). However, as each card is unique, there is no such pair.
So, the number of favorable outcomes is 900.
Step 3: Final Probability
\( P = \frac{900}{1225} = \frac{36}{49} \approx 0.7347 \)
Conclusion
The probability that two cards drawn at random are not of the same color and not of the same number is 36/49 (or about 73.47%).
4. SQL Data Science Questions: Calculating Post Rates
Question #1
You have the following table:
| User_id | Action (post, edit, cancel) | Date |
|---|---|---|
| 123 | post | 2024-06-01 |
| 124 | edit | 2024-06-01 |
| 123 | cancel | 2024-06-02 |
What is the post rate?
Answer & Explanation
The post rate is typically defined as the ratio of number of “post” actions to the total number of actions.
SELECT
COUNT(CASE WHEN action = 'post' THEN 1 END) * 1.0 / COUNT(*) AS post_rate
FROM
user_actions
This query counts the number of “post” actions and divides by the total number of actions to get the post rate.
Question #2
You have an additional table (active users):
| User_id | Country | Active | Date |
|---|---|---|---|
| 123 | USA | 1 | 2024-06-01 |
| 124 | India | 1 | 2024-06-01 |
What is the average post rate for yesterday for all active users by country?
Answer & Explanation
You need to:
- Find all users who were active “yesterday” by country.
- For each user, calculate their post rate for “yesterday”.
- Then average the post rates by country.
WITH user_post_rate AS (
SELECT
a.user_id,
au.country,
COUNT(CASE WHEN a.action = 'post' THEN 1 END) * 1.0 / COUNT(*) AS post_rate FROM user_actions a INNER JOIN active_users au ON a.user_id = au.user_id AND a.date = au.date WHERE a.date = CURRENT_DATE - INTERVAL '1 day' AND au.active = 1 GROUP BY a.user_id, au.country ) SELECT country, AVG(post_rate) as avg_post_rate_yesterday FROM user_post_rate GROUP BY country ORDER BY avg_post_rate_yesterday DESC;
Explanation:
Theuser_post_rateCTE computes the individual post rate for each active user in each country for yesterday.The main query then averages these user-level post rates for each country.COUNT(CASE WHEN action = 'post' ...)counts posts,COUNT(*)gives total actions, and their ratio is the user's post rate.Joining on bothuser_idanddateensures we only consider actions from active users for “yesterday.”
Further Enhancements & Edge Cases
If a user performed no actions yesterday, you may want to handle division by zero or exclude such users.If you want to include users with zero actions (post rate = 0), consider using aLEFT JOINand handlingNULLvalues accordingly.
5. Statistics: How Can You Tell if a Coin is Biased?
Question
How can you tell if a given coin is biased?
Answer & Explanation
This is a classic hypothesis testing question, commonly asked to test your understanding of statistics and experimental design.
Step 1: State the Hypotheses
- Null Hypothesis (Hโ): The coin is fair (i.e., probability of heads, \( p = 0.5 \)).
- Alternative Hypothesis (Hโ): The coin is biased (i.e., \( p \neq 0.5 \)).
Step 2: Collect Data
- Flip the coin n times, and record the number of heads (\( k \)).
Step 3: Perform a Hypothesis Test
Since each flip is a Bernoulli trial, the number of heads follows a Binomial distribution: \( Binomial(n, p) \).
- If \( n \) is large, you can use a normal approximation.
- Otherwise, use the exact binomial calculation.
The sample proportion of heads is: \( \hat{p} = \frac{k}{n} \)
If the coin is fair, the expected number of heads is \( n \times 0.5 \) and the standard deviation is \( \sqrt{n \times 0.5 \times 0.5} \).
Step 4: Compute the p-value
You want to know the probability of seeing at least as extreme a result as \( k \) heads (or fewer), given the null hypothesis. That is, the p-value is:
\( p\text{-value} = P(\text{X} \leq k) + P(\text{X} \geq n-k) \)
where \( X \sim Binomial(n, 0.5) \).
If the p-value is less than your chosen significance level (\( \alpha \), e.g., 0.05), you reject the null hypothesis and say the coin is likely biased.
Step 5: Example Calculation
Suppose you flip the coin 100 times and get 60 heads.
- Expected heads: 50.
- Standard deviation: \( \sqrt{100 \times 0.5 \times 0.5} = 5 \).
- Z-score: \( z = \frac{60 - 50}{5} = 2 \).
- The two-tailed p-value for \( z = 2 \) is about 0.0455.
- Since 0.0455 < 0.05, you would reject the null hypothesis and conclude the coin is likely biased.
Step 6: Confidence Intervals
Alternatively, construct a 95% confidence interval for the true probability of heads using:
\( \hat{p} \pm z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n} } \)
If 0.5 is not within this interval, the coin is likely biased.
Step 7: Practical Considerations
The larger the sample size, the more power your test has to detect small biases.Randomness: Make sure your flips are independent and unbiased by the flipper.
Conclusion
Interview questions at Google, Meta, and other top tech companies challenge candidates not just to recall facts, but to think analytically and holistically about data problems. Whether it’s interpreting A/B test results, analyzing probabilities, writing robust SQL queries, or designing statistical tests, success comes from understanding both the technical and business context. Use these questions and detailed answers to deepen your understanding and sharpen your interview skills. Good luck on your data science journey!
Related Articles
- Comprehensive Interview Prep Guide for Quant Finance, Data Science, and Analytics Roles
- Top 5 Data Scientist Interview Questions from Netflix, LinkedIn and Apple
- AI Interview Question - JP Morgan
- Top 4 SQL Data Scientist Interview Questions from Meta
- Top 5 Data Scientist Interview Questions from Amazon and Meta
