RevoU Python Assignment
Project Summary
- Imported and joined dataset from the source files.
- Conducted data cleaning such as removing null values, removing outliers, removing irrelevant variables, and machine learning.
- Conducted exploratory data analysis on the dataset and convey the important findings
- Created user segmentation using k-means clustering
Insights
- Highest value customer is dominated by both of the promo sensitive customer and non promo sensitive.
- High Value Caustomer has the highest transaction amount in the past 6 months.
- Transaction count is being dominated from “High Activity” group.
- The lowest customer value is really sensitive to promotion.
- Lowest customer value has the highest amount of transaction and Based on previous graphs this could be the effect of the promotion program.
Project Files
For a more comprehensive analysis and visualization, please open the project files.Project Background

Python is the most challenging yet also exciting data programming languages. In this assignment we practiced python skills such as data cleaning and exploratory data analysis using python code. Also, we practiced more advanced python skills such as user segmentation using cluster analysis. We used Google Collab as a python notebook tools.
Data Scope, Goals & Objectives
In this assignment we used data from kaggle. In this dataset thera are several information such as profile information, average transaction, promo transaction, & etc.
Goals
Objectives
- Conducting data cleaning using various methods so the result would be more accurate.
- Exploratory data analysis of the dataset to find the problem within REVOU BANK.
- Creating user segmentation using cluster analysis to help targeted marketing.
Data Analysis
Note : only important steps shown to simplify the analysis explanation.
Data Preparation & Cleaning
Data Prep
Python environment preparation by loading the necessary library.
Import Dataset
Imported dataset from google sheets using csv interpreter.
Handling Data
Removed irrelevant features from the dataset.
Removed duplicate values from the dataset.
Feature Format
Changed the feature format to datetime, account id to str, homeowner status to int. This is necessary to analyze the data further.
Exploratory Data Analysis
Evaluative descriptive statistics
Numerical feature desciption using describe function in python.
Customer Demographic

Promo-Sensitive by MAPP_Active_Group

Highest value customer is dominated by both of the promo sensitive customer and non promo sensitive.
Transaction Amount Customer by MAPP_Active_Group

High Value Caustomer has the highest transaction amount in the past 6 months.
User Segmentation
Preparing the data for cluster analysis
How many clusters? We used Elbow Method and Silhouette Analysis

The turning point to determine the number of clusters between 3 or 4 requires further examination using the silhouette method.


The chosen cluster is 3 because although cluster 2 has a high silhouette score, it doesn't provide sufficient insight for segmentation analysis.
Creating cluster using K-Means
The K-Means clustering used because the data have more numerical features than categorical features.
- The distribution of data on each cluster quite good (no cluster with small count).
- Cluster 0: has the highest Average Transaction Freq & Highest Revenue Generated.
- Cluster 1: is the highest average sales.
- Cluster 2: is the most being promo-sensitive client.
Recommendation
- For Cluster 0 Investment and Wealth Management Service (Deposito).
- For Cluster 1 Offer Higher Credit Limits.
- For Cluster 2 Cashback and Reward Program.
Home