blog-cover-image

Tricky SQL Interview Question: Calculating Revenue from Loyal Customers

If you’re preparing for SQL interview questions for data analyst or data scientist roles, you’ll often encounter real-world business scenarios where you need to write queries that not only return correct results but also demonstrate structured problem-solving.

In this post, we’ll walk through an actual SQL interview-style problem that a major U.S. hotel chain could ask during a data role interview. The challenge involves identifying loyal customers and calculating the percentage of total revenue they contributed in a given year.


Interview Question: The Business Scenario

Imagine you are an analyst for a major hotel chain with locations across the United States. The marketing team is planning a new promotion focused on loyal customers. Before launching it, they want to know:

👉 What percentage of our 2017 revenue came from loyal customers?


What Defines a Loyal Customer?

For this problem, a loyal customer must meet all three of these criteria:

  1. Membership – The customer must be part of the company’s points system (is_member = true).

  2. Repeat Stays per Hotel – At each hotel they visited, they must have stayed more than 2 times.

  3. Multiple Locations – They must have stayed at 3 different hotels across the U.S.


Table Schema

You’re given a dataset customer_transactions with the following schema:

Column Name Data Type Description
customer_id id Unique customer identifier
hotel_id integer Unique ID for hotel
transaction_id integer Unique ID for transaction
first_night string Start date of stay, format "YYYY-mm-dd"
number_of_nights integer Number of nights stayed
total_spend integer Total spend in USD for the stay
is_member boolean Whether the customer is a member of the points system

Step 1: Understanding the Problem

We need two numbers:

  • Revenue from loyal customers

  • Total revenue (all customers)

Then compute:

\(Percent Revenue from Loyal Customers= \frac{\text{Revenue Loyal}}{\text{Revenue Total}} \times 100\)

The challenge lies in identifying loyal customers correctly before aggregating revenue.

You need to be familiar with CTEs and joins to solve this problem. 


Step 2: Break Down the Logic

  1. Filter to members onlyWHERE is_member = true.

  2. Check repeat stays per hotel → Customers must have more than 2 stays at each hotel they visited.

    • This means counting transactions per customer_id + hotel_id.

  3. Check different hotel count → Customer must have stayed at least in 3 unique hotels.

  4. Revenue aggregation → Once loyal customers are identified, calculate their total spend and compare to total revenue.


Step 3: SQL Query

Here’s a possible solution:

We create CTEs for member transactions, hotel stays and "valid customer" - to find customers who satisfy the constraints given in the question. Similary, we create two more CTEs for revenue - total and loyal. The loyal revenue is calculated using the "valid customers" CTEs with the help of a join as shown below.

WITH member_transactions AS (
    SELECT *
    FROM customer_transactions
    WHERE is_member = TRUE
),
hotel_stays AS (
    SELECT 
        customer_id,
        hotel_id,
        COUNT(DISTINCT transaction_id) AS num_stays
    FROM member_transactions
    GROUP BY customer_id, hotel_id
),
valid_customers AS (
    SELECT 
        customer_id
    FROM hotel_stays
    GROUP BY customer_id
    HAVING 
        MIN(num_stays) > 2     -- more than 2 stays at each hotel visited
        AND COUNT(DISTINCT hotel_id) >= 3  -- at least 3 unique hotels
),
loyal_revenue AS (
    SELECT 
        SUM(ct.total_spend) AS loyal_spend
    FROM customer_transactions ct
    JOIN valid_customers vc
      ON ct.customer_id = vc.customer_id
    WHERE ct.is_member = TRUE
      AND strftime('%Y', ct.first_night) = '2017'
),
total_revenue AS (
    SELECT 
        SUM(total_spend) AS total_spend
    FROM customer_transactions
    WHERE strftime('%Y', first_night) = '2017'
)
SELECT 
    ROUND(100.0 * loyal_spend / total_spend, 2) AS loyal_revenue_percent
FROM loyal_revenue, total_revenue;

Step 4: Walkthrough of the Query

  • member_transactions → filters down to only members.

  • hotel_stays → counts stays per customer per hotel.

  • valid_customers → ensures:

    • Every hotel they visited had >2 stays (MIN(num_stays) > 2).

    • They stayed in 3+ different hotels.

  • loyal_revenue → sums total spend from valid loyal customers in 2017.

  • total_revenue → sums all customer spend in 2017.

  • Final SELECT → computes % revenue from loyal customers.


Step 5: Interviewer Follow-Up Questions

In interviews, solving the query is only half the battle. Expect follow-up questions such as:

  1. Performance Optimization

    • How would this query perform on billions of rows?

    • Could you use indexes on customer_id, hotel_id, and first_night?

  2. Edge Cases

    • What if a customer stayed 3 times in one hotel and 0 times in others?

    • How would you handle cancellations or refunds?

  3. Business Understanding

    • Why does the company define “loyal” this way?

    • How would changing the definition (e.g., 2 different hotels instead of 3) impact the results?


Key Takeaways

  • SQL interview questions often combine business rules with technical filtering and aggregation.

  • Always break down complex conditions into smaller logical steps using CTEs.

  • Be prepared to discuss query performance, edge cases, and business implications.


Final Thoughts

This problem is a great example of the type of SQL case study questions asked in interviews for data analysts, data scientists, and data engineers. Practicing problems like this will help you stand out in interviews—not just by writing the query, but by demonstrating structured thinking and real-world awareness.

If you want to practice more real interview questions like this, sign up on Dataloopr.

Related Articles