Simon S. OBAME
Project Manager
Hi! My name is Simon, I'm a Project Manager with over 12 years of experience in the energy and water sector.
DOB: October 28, 1985
Email : contact @ stevenobame.com
Website: www.stevenobame.com
This project focuses on optimizing marketing campaigns for UFood, Brazil's premier food delivery app. As the lead data analyst on this endeavor, I delved deep into a dataset to uncover actionable insights aimed at enhancing customer engagement and driving value for the company.
Consider a well-established company operating in the retail food sector. Presently they have around several hundred thousands of registered customers and serve almost one million consumers a year. They sell products from 5 major categories: wines, rare meat products, exotic fruits, specially prepared fish and sweet products. These can further be divided into gold and regular products. The customers can order and acquire products through 3 sales channels: physical stores, catalogs and company’s website. Globally, the company had solid revenues and a healthy bottom line in the past 3 years, but the profit growth perspectives for the next 3 years are not promising... For this reason, several strategic initiatives are being considered to invert this situation.
One is to improve the performance of marketing activities, with a special focus on marketing campaigns.
UFood is the lead food delivery app in Brazil, present in over a thousand cities.
Keeping a high customer engagement is key for growing and consolidating the company’s position as the market leader.
Data Analysts working within the data team are constantly challenged to provide insights and value to the company through open scope projects. This case intends to simulate that. In this case, you are presented a sample dataset, that mocks metainformation on the customer and on UFood campaign interactions with that customer.
It is your challenge to understand the data, find business opportunities & insights and to propose any data driven action to optimize the campaigns results & generate value to the company. You should consider that you have to present your results to both technical and business stakeholders.
The marketing department was pressured to spend its annual budget more wisely. Desirably, the success of these activities will prove the value of the approach and convince the more skeptical within the company.
Ask: The project seeks to address several critical questions to optimize marketing campaigns for UFood:
The Marketing department doesn't explicitly specify the business tasks. However, to accomplish these objectives, the following tasks need to be undertaken:
For the purpose of this project, I have worked with the provided dataset from UFood, which encompasses a wide range of variables related to customer behavior, campaign interactions, and demographic characteristics. This dataset offers a valuable foundation for conducting thorough analysis and deriving actionable insights to optimize marketing strategies and enhance customer engagement on the UFood platform.
Analyze the data
import pandas as pd food = pd.read_csv(r"C:/Users/HP/Documents/Formations_Certifications/Alex_Analyst/Python_Pandas/Projects/Analysis_Project2/u_food_marketing.csv", encoding="ISO-8859-1") food.head() food.info()
## Output : <class 'pandas.core.frame.DataFrame'> RangeIndex: 2205 entries, 0 to 2204 Data columns (total 39 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Income 2205 non-null float64 1 Kidhome 2205 non-null int64 2 Teenhome 2205 non-null int64 3 Recency 2205 non-null int64 4 MntWines 2205 non-null int64 5 MntFruits 2205 non-null int64 6 MntMeatProducts 2205 non-null int64 7 MntFishProducts 2205 non-null int64 8 MntSweetProducts 2205 non-null int64 9 MntGoldProds 2205 non-null int64 10 NumDealsPurchases 2205 non-null int64 11 NumWebPurchases 2205 non-null int64 12 NumCatalogPurchases 2205 non-null int64 13 NumStorePurchases 2205 non-null int64 14 NumWebVisitsMonth 2205 non-null int64 15 AcceptedCmp3 2205 non-null int64 16 AcceptedCmp4 2205 non-null int64 17 AcceptedCmp5 2205 non-null int64 18 AcceptedCmp1 2205 non-null int64 19 AcceptedCmp2 2205 non-null int64 20 Complain 2205 non-null int64 21 Z_CostContact 2205 non-null int64 22 Z_Revenue 2205 non-null int64 23 Response 2205 non-null int64 24 Age 2205 non-null int64 25 Customer_Days 2205 non-null int64 26 marital_Divorced 2205 non-null int64 27 marital_Married 2205 non-null int64 28 marital_Single 2205 non-null int64 29 marital_Together 2205 non-null int64 30 marital_Widow 2205 non-null int64 31 education_2n Cycle 2205 non-null int64 32 education_Basic 2205 non-null int64 33 education_Graduation 2205 non-null int64 34 education_Master 2205 non-null int64 35 education_PhD 2205 non-null int64 36 MntTotal 2205 non-null int64 37 MntRegularProds 2205 non-null int64 38 AcceptedCmpOverall 2205 non-null int64 dtypes: float64(1), int64(38) memory usage: 672.0 KB
# In this dataset we have 2205 rows and 39 columns pd.set_option("display.max.rows", 2300) pd.set_option("display.max.columns", 60)
Data Cleaning
# Checking the duplicates food[food.duplicated()].sum()
Income 9366769.0 Kidhome 79.0 Teenhome 87.0 Recency 9277.0 MntWines 55672.0 MntFruits 4936.0 MntMeatProducts 28906.0 MntFishProducts 7256.0 MntSweetProducts 4708.0 MntGoldProds 8380.0 NumDealsPurchases 403.0 NumWebPurchases 725.0 NumCatalogPurchases 488.0 NumStorePurchases 1104.0 NumWebVisitsMonth 975.0 AcceptedCmp3 12.0 AcceptedCmp4 9.0 AcceptedCmp5 15.0 AcceptedCmp1 9.0 AcceptedCmp2 4.0 Complain 1.0 Z_CostContact 552.0 Z_Revenue 2024.0 Response 22.0 Age 9358.0 Customer_Days 464574.0 marital_Divorced 16.0 marital_Married 69.0 marital_Single 34.0 marital_Together 59.0 marital_Widow 6.0 education_2n Cycle 15.0 education_Basic 5.0 education_Graduation 98.0 education_Master 29.0 education_PhD 37.0 MntTotal 101478.0 MntRegularProds 93098.0 AcceptedCmpOverall 49.0 dtype: float64
# Removing the duplicates food.drop_duplicates(keep=False, inplace=True) food.info()
## Output : <class 'pandas.core.frame.DataFrame'> Index: 1843 entries, 0 to 2204 Data columns (total 39 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Income 1843 non-null float64 1 Kidhome 1843 non-null int64 2 Teenhome 1843 non-null int64 3 Recency 1843 non-null int64 4 MntWines 1843 non-null int64 5 MntFruits 1843 non-null int64 6 MntMeatProducts 1843 non-null int64 7 MntFishProducts 1843 non-null int64 8 MntSweetProducts 1843 non-null int64 9 MntGoldProds 1843 non-null int64 10 NumDealsPurchases 1843 non-null int64 11 NumWebPurchases 1843 non-null int64 12 NumCatalogPurchases 1843 non-null int64 13 NumStorePurchases 1843 non-null int64 14 NumWebVisitsMonth 1843 non-null int64 15 AcceptedCmp3 1843 non-null int64 16 AcceptedCmp4 1843 non-null int64 17 AcceptedCmp5 1843 non-null int64 18 AcceptedCmp1 1843 non-null int64 19 AcceptedCmp2 1843 non-null int64 20 Complain 1843 non-null int64 21 Z_CostContact 1843 non-null int64 22 Z_Revenue 1843 non-null int64 23 Response 1843 non-null int64 24 Age 1843 non-null int64 25 Customer_Days 1843 non-null int64 26 marital_Divorced 1843 non-null int64 27 marital_Married 1843 non-null int64 28 marital_Single 1843 non-null int64 29 marital_Together 1843 non-null int64 30 marital_Widow 1843 non-null int64 31 education_2n Cycle 1843 non-null int64 32 education_Basic 1843 non-null int64 33 education_Graduation 1843 non-null int64 34 education_Master 1843 non-null int64 35 education_PhD 1843 non-null int64 36 MntTotal 1843 non-null int64 37 MntRegularProds 1843 non-null int64 38 AcceptedCmpOverall 1843 non-null int64 dtypes: float64(1), int64(38) memory usage: 575.9 KB
After removing the duplicates, we are left with 1843 rows and 39 columns.
# Some columns can be grouped in one because they have the same information food["Total_Children"] = food[["Kidhome", "Teenhome"]].sum(axis=1) food["marital_Divorced"] = food["marital_Divorced"].replace({1:5,0:0}) food["marital_Married"] = food["marital_Married"].replace({1:4,0:0}) food["marital_Single"] = food["marital_Single"].replace({1:3,0:0}) food["marital_Together"] = food["marital_Together"].replace({1:2,0:0}) food["marital_Widow"] = food["marital_Widow"].replace({1:1,0:0}) food["marital_Status"] = food[["marital_Divorced", "marital_Married", "marital_Single", "marital_Together", "marital_Widow"]].sum(axis=1) # Mapping the marital columns into a new column. food["marital_Status_str"] = food["marital_Status"].map({5 : "Divorced", 4 : "Married", 3 : "Single", 2 : "Together", 1 : "Widow"}) food.info()
## Output : <class 'pandas.core.frame.DataFrame'> RangeIndex: 2205 entries, 0 to 2204 Data columns (total 42 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Income 2205 non-null float64 1 Kidhome 2205 non-null int64 2 Teenhome 2205 non-null int64 3 Recency 2205 non-null int64 4 MntWines 2205 non-null int64 5 MntFruits 2205 non-null int64 6 MntMeatProducts 2205 non-null int64 7 MntFishProducts 2205 non-null int64 8 MntSweetProducts 2205 non-null int64 9 MntGoldProds 2205 non-null int64 10 NumDealsPurchases 2205 non-null int64 11 NumWebPurchases 2205 non-null int64 12 NumCatalogPurchases 2205 non-null int64 13 NumStorePurchases 2205 non-null int64 14 NumWebVisitsMonth 2205 non-null int64 15 AcceptedCmp3 2205 non-null int64 16 AcceptedCmp4 2205 non-null int64 17 AcceptedCmp5 2205 non-null int64 18 AcceptedCmp1 2205 non-null int64 19 AcceptedCmp2 2205 non-null int64 20 Complain 2205 non-null int64 21 Z_CostContact 2205 non-null int64 22 Z_Revenue 2205 non-null int64 23 Response 2205 non-null int64 24 Age 2205 non-null int64 25 Customer_Days 2205 non-null int64 26 marital_Divorced 2205 non-null int64 27 marital_Married 2205 non-null int64 28 marital_Single 2205 non-null int64 29 marital_Together 2205 non-null int64 30 marital_Widow 2205 non-null int64 31 education_2n Cycle 2205 non-null int64 32 education_Basic 2205 non-null int64 33 education_Graduation 2205 non-null int64 34 education_Master 2205 non-null int64 35 education_PhD 2205 non-null int64 36 MntTotal 2205 non-null int64 37 MntRegularProds 2205 non-null int64 38 AcceptedCmpOverall 2205 non-null int64 39 Total_Children 2205 non-null int64 40 marital_Status 2205 non-null int64 41 marital_Status_str 2205 non-null object dtypes: float64(1), int64(40), object(1) memory usage: 723.6+ KB
We can performi the same operations on the education information.
food["education_PhD"] = food["education_PhD"].replace({1:5,0:0}) food["education_Master"] = food["education_Master"].replace({1:4,0:0}) food["education_Graduation"] = food["education_Graduation"].replace({1:3,0:0}) food["education_Basic"] = food["education_Basic"].replace({1:2,0:0}) food["education_2n Cycle"] = food["education_2n Cycle"].replace({1:1,0:0}) food["education_Status"] = food[["education_PhD", "education_Master", "education_Graduation", "education_Basic", "education_2n Cycle"]].sum(axis=1) # Mapping the education columns into a new column food["education_Status_str"] = food["education_Status"].map({5 : "PhD", 4 : "Master", 3 : "Graduation", 2 : "Basic", 1 : "2n Cycle"}) food["Accepted_Campaigns"] = food[["AcceptedCmp1", "AcceptedCmp2", "AcceptedCmp3", "AcceptedCmp4", "AcceptedCmp5"]].sum(axis=1) food["Accepted_Campaigns"] = (food["Accepted_Campaigns"] != 0).astype(int) food.corr(method="pearson", numeric_only=True)["Accepted_Campaigns"].sort_values(ascending=False)
## Output : Accepted_Campaigns 1.000000 AcceptedCmpOverall 0.859326 AcceptedCmp4 0.553623 AcceptedCmp3 0.551797 AcceptedCmp5 0.548133 AcceptedCmp1 0.512399 MntWines 0.463465 MntTotal 0.414528 MntRegularProds 0.413418 Response 0.367892 Income 0.336105 NumCatalogPurchases 0.333796 MntMeatProducts 0.285121 AcceptedCmp2 0.229374 NumWebPurchases 0.210733 MntGoldProds 0.188876 NumStorePurchases 0.185159 MntFishProducts 0.159548 MntSweetProducts 0.158667 MntFruits 0.124605 education_Status 0.039992 education_PhD 0.038395 Age 0.030203 marital_Widow 0.013568 marital_Married 0.008298 marital_Single -0.000211 marital_Status -0.000400 education_Master -0.004837 marital_Divorced -0.006486 education_Graduation -0.007113 marital_Together -0.010172 Customer_Days -0.013342 education_2n Cycle -0.016136 Recency -0.016200 Complain -0.025402 education_Basic -0.037729 NumDealsPurchases -0.087632 Teenhome -0.100674 NumWebVisitsMonth -0.130340 Kidhome -0.203011 Total_Children -0.218689 Z_CostContact NaN Z_Revenue NaN Name: Accepted_Campaigns, dtype: float64
Data Visualization and Share
import seaborn as sns all_correlations = food.corr(method="pearson", numeric_only=True) all_correlations = all_correlations[(all_correlations > 0.3) & (all_correlations < 1)] sns.heatmap(all_correlations)
For the next step, we will define ranges from 23 to 80 and assign ages within those ranges
age_groups = [(23,30), (31,40), (41,50), (51,60), (61,70), (71,80)] def assign_age_group(Age): for age_range in age_groups: if age_range[0] <= Age <= age_range[1]: return f"{age_range[0]} - {age_range[1]}" return("Unknown") food["Age_group"] = food["Age"].apply(assign_age_group) food[["Age", "Age_group"]].head()
## Output : Age Age_group 0 63 61 - 70 1 66 61 - 70 2 55 51 - 60 3 36 31 - 40 4 39 31 - 40
food["Age_group"].value_counts()
## Output : Age_group 41 - 50 727 51 - 60 502 61 - 70 453 31 - 40 357 71 - 80 105 23 - 30 61 Name: count, dtype: int64
# let's visualize age_order = ["23 - 30","31 - 40","41 - 50","51 - 60","61 - 70","71 - 80"] sns.pointplot(data = food, x = "Age_group", y = "Accepted_Campaigns", order = age_order)
There are many people in the age ranges from 31 to 70, so let's visualize these ranges by percentage.
counts = food["Age_group"].value_counts() percentage = counts / food.shape[0] percentage_food = percentage.reset_index() percentage_food.columns = ["age_group", "percentage"] percentage_food = percentage_food.sort_values("age_group") percentage_food
## Output : age_group percentage 5 23 - 30 0.027664 3 31 - 40 0.161905 0 41 - 50 0.329705 1 51 - 60 0.227664 2 61 - 70 0.205442 4 71 - 80 0.047619
import matplotlib.pyplot as plt sns.barplot(x = "age_group", y = "percentage", data = percentage_food) plt.title("Percentage of Accepted Campaigns per Age Group") plt.show()
Age segmentation - core audience for accepting campaigns is 31 - 70. We will now determine the total spending for this age range
food.groupby("Age_group")["MntTotal"].sum()
## Output : Age_group 23 - 30 45837 31 - 40 175176 41 - 50 350590 51 - 60 290520 61 - 70 285289 71 - 80 93484 Name: MntTotal, dtype: int64
grouped_food = food.groupby("Age_group")["MntTotal"].sum().reset_index() sns.barplot(x = "Age_group", y = "MntTotal", data = grouped_food) plt.title("Amount Spent Per Age Group") plt.show()
As the previous result we see that the range from 31 to 70 spend a lot of money, We will now examine the amount spent by individuals who accepted the campaign
accepted_campaign = food[food["Accepted_Campaigns"] != 0] grouped_food = accepted_campaign.groupby("Age_group")["MntTotal"].sum().reset_index() sns.barplot(x = "Age_group", y = "MntTotal", data = grouped_food) plt.title("Amount Spent Per Age Group") plt.show()
Now we can observe that the percentage spent by individuals who accepted the campaign has increased, especially for the age range of 71 to 80 years old, but not for the range of 23 to 30.
Let's delve into the number of purchases
food[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum()
## Output : NumWebPurchases 9042 NumCatalogPurchases 5833 NumStorePurchases 12841 dtype: int64
sum_food = pd.DataFrame(food[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum(), columns=["Sums"]) sum_food = sum_food.reset_index() sum_food.rename(columns={"index":"Type_of_Purchase"}, inplace=True) sum_food
## Output : Type_of_Purchase Sums 0 NumWebPurchases 9042 1 NumCatalogPurchases 5833 2 NumStorePurchases 12841
# let's visualize sns.barplot(x = "Type_of_Purchase", y="Sums", data = sum_food)
Let's visualize the number of purchases made by individuals who accepted the campaigns
accepted_campaign = food[food["Accepted_Campaigns"] != 0] sum_food = pd.DataFrame(accepted_campaign[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum(), columns=["Sums"]) sum_food = sum_food.reset_index() sum_food.rename(columns={"index":"Type_of_Purchase"}, inplace=True) sns.barplot(x = "Type_of_Purchase", y="Sums", data = sum_food)
x = sns.jointplot(data = food, x = "MntTotal", y = "NumWebPurchases", kind = "kde") x.plot_joint(sns.regplot, color = "r")
x = sns.jointplot(data = food, x = "MntTotal", y = "NumCatalogPurchases", kind = "kde") x.plot_joint(sns.regplot, color = "r")
x = sns.jointplot(data = food, x = "MntTotal", y = "NumStorePurchases", kind = "kde") x.plot_joint(sns.regplot, color = "r")
These visualizations demonstrate a correlation: the more individuals spend on the web, the more they tend to spend overall. Now, let's examine the distribution of the amount spent compared to whether individuals have children or not.
sns.regplot(x = "Total_Children", y="MntTotal", data = food)
As the number of children increases, individuals tend to spend less money.
sns.regplot(x = "Total_Children", y="Accepted_Campaigns", data = food)
As the number of children increases, individuals are less likely to accept the campaigns.
sns.regplot(x = "education_Status", y="Accepted_Campaigns", data = food)
As education levels increase, individuals are more likely to accept the campaign.
ns.regplot(x = "education_Status", y="MntTotal", data = food)
It appears that education level does not significantly influence whether individuals accept the campaign or spend more money. Let's analyze the distribution basing on the marital status
sns.countplot(x = "marital_Status_str", data = food)
Here, we have the number of people per marital status.
rel_food = food.groupby("marital_Status_str")["MntTotal"].sum().reset_index() sns.barplot(x = "marital_Status_str", y="MntTotal", data = rel_food)
Married individuals are spending more money compared to others.
accepted_campaign = food[food["Accepted_Campaigns"] != 0] rel_food = accepted_campaign.groupby("marital_Status_str")["MntTotal"].sum().reset_index() sns.barplot(x = "marital_Status_str", y="MntTotal", data = rel_food)
total_count = food["marital_Status_str"].value_counts() accepted = food[food["Accepted_Campaigns"] == 1]["marital_Status_str"].value_counts() percentage_marital = accepted / total_count * 100 percentage_food = percentage_marital.reset_index() percentage_food.columns = ["Marital_Status", "Percentage"] percentage_food
## Output : Marital_Status Percentage 0 Married 21.194379 1 Together 20.070423 2 Single 20.754717 3 Divorced 20.000000 4 Widow 23.684211
sns.barplot(x = "Marital_Status", y="Percentage", data = percentage_food)
As observed, individuals who accepted campaigns tend to spend more money. Therefore, "Married," "Single," and "Together" individuals spend more money compared to widowed and divorced individuals.
Simon Steven OBAME
Hi! My name is Simon, I'm a Project Manager.