ใ€€

blog-cover-image

Emirates NBD Interview Question - Customer Propensity Model

In retail banking sector, understanding and predicting customer behavior is crucial for growth and profitability. Data analytics and machine learning play a pivotal role in enabling banks to offer personalized services, optimize marketing campaigns, and enhance customer experience. A popular analytical technique in this domain is the Customer Propensity Model, which predicts the likelihood of a customer performing a certain action, such as purchasing a new product, upgrading an account, or churning.

In an interview for a senior role in data analytics and machine learning at Emirates NBD a candidate was recently asked this question:

Explain step by step how you would develop a customer propensity model.

This article provides a comprehensive, step-by-step guide to building a Customer Propensity Model in retail banking, starting from the fundamentals of propensity modeling to advanced implementation and deployment strategies.

Understanding Propensity Modeling: Fundamentals

Propensity modeling is a statistical approach used to predict the probability (or propensity) that a customer will take a specific action. In retail banking, these actions could include responding to a marketing offer, applying for a loan, or closing an account. The output of a propensity model is typically a score between 0 and 1, representing the estimated probability of the event of interest for each customer.

Key Concepts in Propensity Modeling

  • Target Event: The specific action or behavior we wish to predict (e.g., product purchase, account upgrade).
  • Features: Customer attributes (demographics, transaction history, engagement metrics) used as inputs to the model.
  • Supervised Learning: Propensity models are typically built using supervised machine learning techniques, requiring labeled historical data.
  • Probability Output: Models output a probability score for each customer, which helps in personalized targeting.

Mathematically, the propensity score for customer \( i \) can be defined as:

\[ P_i = P(Y_i = 1 | X_i) \] where:

  • \( Y_i \): Binary indicator of the target event for customer \( i \) (1 if event occurs, 0 otherwise)
  • \( X_i \): Feature vector for customer \( i \)

 

Why is Propensity Modeling Important in Retail Banking?

  • Personalized Marketing: Enables targeted campaigns, increasing ROI and customer satisfaction.
  • Cross-Sell/Upsell: Identifies customers likely to buy additional products or upgrade services.
  • Churn Prevention: Predicts customers at risk of leaving, allowing proactive retention efforts.
  • Resource Optimization: Allocates sales and marketing resources efficiently.

Step-by-Step Guide: Building a Customer Propensity Model for Emirates NBD

Let’s walk through a detailed, structured approach to developing a customer propensity model in the context of retail banking at Emirates NBD, covering all key stages from problem definition to deployment and monitoring.

1. Problem Definition & Business Understanding

The first and most critical step in any machine learning project is to clearly define the problem. For a customer propensity model, work closely with business stakeholders to answer:

  • What is the target event? (e.g., response to a credit card offer, availing a personal loan, digital channel adoption)
  • What is the business objective? (e.g., increase product uptake, reduce churn, improve campaign ROI)
  • How will the model’s predictions be used? (e.g., targeted campaigns, personalized recommendations)
  • How will success be measured? (e.g., lift in conversion rate, incremental revenue, reduction in churn)

2. Data Collection & Exploration

High-quality, relevant data is the backbone of a successful propensity model. The data typically comes from various sources:

  • Core banking systems (account balances, transactions)
  • CRM systems (customer profiles, service interactions)
  • Digital channels (website/app activity, email engagement)
  • External data (credit bureau, demographic data)

2.1 Data Types & Examples

Data Source Examples
Demographic Age, Gender, Location, Occupation
Product Holding Number of products, Product type, Tenure
Transaction Monthly debit/credit, ATM usage, POS transactions
Engagement Logins, Mobile app usage, Email opens
Behavioral Branch visits, Call center interactions
External Credit score, Bureau flags, Socioeconomic status

2.2 Data Exploration

Exploratory Data Analysis (EDA) is essential to understand data distributions, identify outliers, and detect missing values. Key EDA steps:

  • Univariate analysis (histogram, boxplot of each variable)
  • Bivariate analysis (correlation with target, cross-tabulations)
  • Missing value analysis
  • Temporal analysis (seasonality, trends)

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('customer_data.csv')
sns.histplot(df['monthly_debit'])
plt.show()

3. Data Preprocessing & Feature Engineering

Data preprocessing transforms raw data into a format suitable for modeling. Feature engineering creates new variables that enhance model predictive power.

3.1 Data Cleaning

  • Impute missing values (mean/median for numeric, mode for categorical, or using advanced imputation methods)
  • Remove duplicates
  • Handle outliers (capping, transformation, or removal)

3.2 Feature Engineering

  • Aggregation: Monthly averages, transaction counts, recent activity windows
  • Ratios: Credit/Debit ratio, product per customer
  • Recency, Frequency, Monetary (RFM) features: For customer engagement and value
  • Interaction features: Combining two or more features to capture complex patterns

3.3 Categorical Encoding & Scaling

  • One-hot encoding or label encoding for categorical variables
  • Standardization or normalization for numerical features

from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Scaling numeric features
scaler = StandardScaler()
df['monthly_debit_scaled'] = scaler.fit_transform(df[['monthly_debit']])

# Encoding categorical features
encoder = OneHotEncoder(sparse=False)
encoded_gender = encoder.fit_transform(df[['gender']])

4. Feature Selection

With numerous features, selecting the most relevant ones prevents overfitting and improves model interpretability. Techniques include:

  • Correlation analysis: Remove highly correlated (multicollinear) features
  • Univariate statistical tests (Chi-square, ANOVA)
  • Recursive Feature Elimination (RFE)
  • Tree-based feature importance (Random Forest, XGBoost)

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(X_train, y_train)
importances = rf.feature_importances_

5. Model Selection & Training

Propensity modeling is a classification problem (usually binary). The choice of algorithm depends on interpretability, accuracy, and business requirements.

5.1 Common Algorithms for Propensity Modeling

  • Logistic Regression: Highly interpretable, outputs calibrated probabilities
  • Decision Trees & Random Forests: Non-linear relationships, handle mixed data types
  • Gradient Boosting (XGBoost, LightGBM): State-of-the-art performance, robust to outliers
  • Neural Networks: For complex interactions and large datasets

5.2 Handling Class Imbalance

In banking, the target event (e.g., product uptake) may be rare. Address class imbalance using:

  • Resampling (SMOTE, undersampling the majority class)
  • Class weights in the loss function
  • Threshold tuning for optimal precision-recall trade-off

from imblearn.over_sampling import SMOTE

smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

5.3 Model Training Example (Logistic Regression)


from sklearn.linear_model import LogisticRegression

model = LogisticRegression(class_weight='balanced')
model.fit(X_resampled, y_resampled)

6. Model Evaluation & Validation

Model evaluation ensures that the propensity model is both accurate and reliable. Use appropriate metrics, especially for imbalanced datasets.

6.1 Key Metrics

  • Area Under ROC Curve (AUC-ROC): Measures model discrimination ability
  • Precision, Recall, F1-Score: Especially important for imbalanced data
  • Lift & Gain Charts: Business-relevant metrics showing improvement over random targeting
  • Calibration: Checks if predicted probabilities reflect actual outcomes

6.2 Cross-Validation

Split data into train-test or use k-fold cross-validation to ensure robustness.


from sklearn.metrics import roc_auc_score, precision_score, recall_score

y_pred_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_proba)
precision = precision_score(y_test, y_pred_proba > 0.5)
recall = recall_score(y_test, y_pred_proba > 0.5)
print(f'AUC: {auc}, Precision: {precision}, Recall: {recall}')

6.3 Lift and Gain Chart Example

Lift at decile \( d \) is defined as:

\[ \text{Lift}_d = \frac{\text{Response Rate in Top } d\%}{\text{Average Response Rate}} \]


7. Model Interpretation & Explainability

In banking, regulatory compliance and business trust require interpretable models. Key techniques:

  • Coefficients (Logistic Regression): Directly show impact of features
  • Feature Importance (Tree-based models): Rank features by influence
  • SHAP/LIME: Model-agnostic tools for local and global interpretability

import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

8. Deployment & Monitoring

Once validated, deploy the model to production systems (e.g., CRM, marketing automation). Steps include:

  • Model serialization (using pickle, joblib, or ONNX)
  • Integration with business applications via APIs
  • Batch or real-time scoring pipelines

Monitoring Model Performance

  • Track key metrics over time (AUC, lift, conversion rate)
  • Monitor data drift and feature importance changes
  • Periodic retraining using fresh data

import pickle

# Save model
with open('propensity_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load model
with open('propensity_model.pkl', 'rb') as f:
    model = pickle.load(f)

9. Best Practices & Challenges in Retail Banking Propensity Modeling

  • Data Privacy & Compliance: Ensure adherence to regulations (GDPR, local laws)
  • Bias Detection: Regularly test for and mitigate bias in model predictions
  • Segmented Modeling: Build separate models for different customer segments if behaviors differ significantly
  • Feedback Loops: Incorporate business feedback to improve models iteratively
  • Collaboration: Work closely with marketing, product, and IT teams for successful adoption

Case Study: Building a Customer Propensity Model for Emirates NBD

Let’s illustrate the process with a simplified case study relevant to Emirates NBD.

  • Objective: Predict which existing customers are most likely to respond to a new credit card offer.
  • Target variable: Binary indicator (1 = responded to offer, 0 = did not).
  • Data sources: Demographics, account activity, digital engagement, previous campaign responses.

Step 1: Data Preparation

  • Extract past 12 months’ data on all customers offered a new credit card.
  • Label as 1 (responded) or 0 (did not respond).

Step 2: Feature Engineering

  • Number of products held
  • Avg. monthly salary credit
  • Recent login activity (recency)
  • Past campaign responses
  • Age, tenure, location

Step 3: Modeling & Evaluation

  • Model Selection: Start with Logistic Regression for interpretability, then compare with Random Forest and XGBoost for potentially higher accuracy.
  • Data Split: Partition data into train (70%) and test (30%) sets using stratified sampling to preserve the proportion of responders to non-responders.
  • Feature Scaling: Standardize numerical variables, encode categorical ones.
  • Imbalance Handling: If only 5% responded, apply SMOTE on the training set to balance classes.
  • Model Training: Train each model using cross-validation to tune hyperparameters.
  • Model Evaluation: Evaluate on the holdout test set using AUC-ROC, Precision-Recall, and Lift at Top Deciles.

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
from sklearn.metrics import roc_auc_score, classification_report

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=42
)

# Logistic Regression
lr = LogisticRegression(class_weight='balanced', max_iter=1000)
lr.fit(X_train, y_train)
lr_pred = lr.predict_proba(X_test)[:,1]
print("Logistic Regression AUC:", roc_auc_score(y_test, lr_pred))

# Random Forest
rf = RandomForestClassifier(class_weight='balanced', n_estimators=100)
rf.fit(X_train, y_train)
rf_pred = rf.predict_proba(X_test)[:,1]
print("Random Forest AUC:", roc_auc_score(y_test, rf_pred))

# XGBoost
xgb_model = xgb.XGBClassifier(scale_pos_weight=10)
xgb_model.fit(X_train, y_train)
xgb_pred = xgb_model.predict_proba(X_test)[:,1]
print("XGBoost AUC:", roc_auc_score(y_test, xgb_pred))

Interpreting Results

  • Feature Importance: For tree-based models, extract and display the top features influencing predictions.
  • Lift Chart: Calculate the lift at the top 10% of customers to show the improvement in targeting efficiency.

import pandas as pd
import numpy as np

# Calculate lift at top decile
def lift_at_decile(y_true, y_scores, decile=0.1):
    df = pd.DataFrame({'y_true': y_true, 'y_scores': y_scores})
    df = df.sort_values('y_scores', ascending=False)
    cutoff = int(len(df) * decile)
    top = df.head(cutoff)
    lift = top['y_true'].mean() / df['y_true'].mean()
    return lift

lift = lift_at_decile(y_test, xgb_pred)
print("Lift at Top Decile:", lift)

Step 4: Interpretation & Business Communication

  • Explain Key Drivers: Use feature importance from XGBoost or coefficient values from logistic regression to identify what drives customer response (e.g., recent logins, high salary credits, prior campaign engagement).
  • Actionable Segments: Identify segments (e.g., young professionals with high engagement) where the model predicts highest propensity, and recommend targeted offers.
  • Simulation: Estimate the business impact (increased conversions, revenue) if the marketing team targets only the top 20% scored by the model.

Communicating Results

Share charts, tables, and key metrics with business stakeholders in clear, non-technical language. For example:

Decile Response Rate Lift
Top 10% 21.0% 4.2x
Next 10% 11.0% 2.2x
Average 5.0% 1.0x

Step 5: Deployment & Monitoring in Production

  • Deployment: Save the trained model and deploy via an API or integrate with the Emirates NBD CRM system.
  • Batch Scoring: Schedule regular scoring of the customer base (e.g., weekly or monthly), updating propensity scores in the database.
  • Monitoring: Track conversion rates among customers targeted by the model. Set up alerts for significant drops in model performance.
  • Retraining: Periodically retrain the model with fresh data to capture evolving customer behavior patterns.

# Example: Model deployment using Flask API
from flask import Flask, request, jsonify
import pickle

app = Flask(__name__)
with open('propensity_model.pkl', 'rb') as f:
    model = pickle.load(f)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    features = [data['feature1'], data['feature2'], ...]
    pred = model.predict_proba([features])[0,1]
    return jsonify({'propensity_score': pred})

if __name__ == '__main__':
    app.run(port=5000)

Step 6: Continuous Improvement

  • Feedback Loop: Collect feedback from marketing teams and analyze post-campaign data to refine the model.
  • Expand Use Cases: Apply propensity modeling to other products (e.g., loans, insurance) and customer actions (e.g., digital adoption, churn).
  • Advanced Modeling: Incorporate deep learning, time-series features, or multi-action propensity (multi-label classification) as data and business needs grow.

Advanced Topics in Customer Propensity Modeling

1. Uplift Modeling

While propensity modeling predicts likelihood of action, uplift modeling estimates the incremental impact of an intervention (e.g., marketing campaign). It answers: “Who is most likely to respond because of the offer?” This helps avoid wasted effort on customers who would buy anyway (sure things) or never buy (lost causes).

The uplift for customer \(i\) can be estimated as:

\[ \text{Uplift}_i = P(Y_i = 1 | \text{Treatment}, X_i) - P(Y_i = 1 | \text{Control}, X_i) \]

Implement using two-model or single-model approaches and design A/B tests for campaign targeting.

2. Time-to-Event (Survival) Analysis

For predicting when a customer will perform an action, use survival analysis techniques (e.g., Cox Proportional Hazards model), especially relevant for churn or product adoption timing.

3. Real-time Scoring & Personalization

  • Integrate models with real-time event streams (e.g., website actions) for instant personalized recommendations.
  • Use scalable tools (Spark, AWS SageMaker, Azure ML) for large-scale banking datasets.

4. Ethical AI and Fairness in Banking

  • Regularly audit models for bias against sensitive groups (e.g., age, gender, income).
  • Implement explainability tools (SHAP, LIME) in dashboards for compliance and transparency.

Common Interview Questions on Propensity Modeling at Emirates NBD

  • What features would you use for a customer propensity model in retail banking?
    • Discuss demographic, behavioral, transactional, engagement, and external features.
  • How would you handle imbalanced classes in campaign response prediction?
    • Mention SMOTE, class weights, precision-recall metrics, and threshold tuning.
  • How would you evaluate the business impact of your model?
    • Describe lift/gain charts, incremental conversions, and ROI estimation.
  • How do you ensure your model is explainable and fair?
    • Talk about feature importance, SHAP/LIME, and bias checks.
  • How would you deploy and monitor the model in production?
    • Describe model serialization, API deployment, regular scoring, and monitoring for drift.

Conclusion

Building a Customer Propensity Model for Emirates NBD or any leading retail bank is a multi-stage process that blends technical, business, and regulatory expertise. Mastery of data exploration, feature engineering, model selection, evaluation, and deployment is essential. Propensity modeling unlocks significant value by enabling personalized, data-driven marketing and customer strategies, ultimately driving growth and customer satisfaction. In interviews, demonstrating a structured, business-centric approach—while highlighting technical rigor, interpretability, and ethical considerations—will set you apart as an expert in data analytics and machine learning.

Emirates NBD’s data science teams look for candidates who not only understand the end-to-end modeling process but can also communicate insights effectively and ensure models translate into real business impact. Practice, stay current with new modeling techniques, and always keep the customer at the heart of your analytics efforts.

Related Articles