Food Marketing Data Analysis Project

This project focuses on optimizing marketing campaigns for UFood, Brazil's premier food delivery app. As the lead data analyst on this endeavor, I delved deep into a dataset to uncover actionable insights 

aimed at enhancing customer engagement and driving value for the company.

About the company

Consider a well-established company operating in the retail food sector. Presently they have around several hundred thousands of registered customers and serve almost one million consumers a year. They sell products from 5 major categories: wines, rare meat products, exotic fruits, specially prepared fish and sweet products. These can further be divided into gold and regular products. The customers can order and acquire products through 3 sales channels: physical stores, catalogs and company’s website. Globally, the company had solid revenues and a healthy bottom line in the past 3 years, but the profit growth perspectives for the next 3 years are not promising... For this reason, several strategic initiatives are being considered to invert this situation.

One is to improve the performance of marketing activities, with a special focus on marketing campaigns.

UFood Data Analyst Case

UFood is the lead food delivery app in Brazil, present in over a thousand cities.

Keeping a high customer engagement is key for growing and consolidating the company’s position as the market leader.

Data Analysts working within the data team are constantly challenged to provide insights and value to the company through open scope projects. This case intends to simulate that. In this case, you are presented a sample dataset, that mocks metainformation on the customer and on UFood campaign interactions with that customer.

It is your challenge to understand the data, find business opportunities & insights and to propose any data driven action to optimize the campaigns results & generate value to the company. You should consider that you have to present your results to both technical and business stakeholders.

Key Objectives are:

  1. Explore the data – don’t just plot means and counts. Provide insights, define cause and effect. Provide a better understanding of the characteristic features of respondents;
  2. Propose and describe a customer segmentation based on customers behaviors;
  3. Visualize data and provide written reasoning behind discoveries;

The Marketing Department

The marketing department was pressured to spend its annual budget more wisely. Desirably, the success of these activities will prove the value of the approach and convince the more skeptical within the company.

Questions to ask

Ask: The project seeks to address several critical questions to optimize marketing campaigns for UFood:

  1. How do customers respond to different marketing campaigns conducted by UFood?
  2. What factors influence customers to accept offers in various campaigns?
  3. How can UFood effectively target customers to increase acceptance rates in future campaigns?

Business Task

The Marketing department doesn't explicitly specify the business tasks. However, to accomplish these objectives, the following tasks need to be undertaken:

  1. Analyze customer responses to each marketing campaign.
  2. Identify key customer characteristics, such as education, marital status, household composition, and income, that correlate with offer acceptance across campaigns.
  3. Evaluate the effectiveness of various marketing channels in influencing customer behavior.
  4. Assess recency of customer purchases and its impact on campaign response rates.
  5. Develop data-driven insights and recommendations to optimize future marketing strategies, improve customer engagement, and drive higher acceptance rates across campaigns.
  6. And more

Prepare the data

For the purpose of this project, I have worked with the provided dataset from UFood, which encompasses a wide range of variables related to customer behavior, campaign interactions, and demographic characteristics. This dataset offers a valuable foundation for conducting thorough analysis and deriving actionable insights to optimize marketing strategies and enhance customer engagement on the UFood platform.

Analyze the data

import pandas as pd

food = pd.read_csv(r"C:/Users/HP/Documents/Formations_Certifications/Alex_Analyst/Python_Pandas/Projects/Analysis_Project2/u_food_marketing.csv",

                            encoding="ISO-8859-1")

food.head()

food.info()
## Output : 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 39 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Income                2205 non-null   float64
 1   Kidhome               2205 non-null   int64  
 2   Teenhome              2205 non-null   int64  
 3   Recency               2205 non-null   int64  
 4   MntWines              2205 non-null   int64  
 5   MntFruits             2205 non-null   int64  
 6   MntMeatProducts       2205 non-null   int64  
 7   MntFishProducts       2205 non-null   int64  
 8   MntSweetProducts      2205 non-null   int64  
 9   MntGoldProds          2205 non-null   int64  
 10  NumDealsPurchases     2205 non-null   int64  
 11  NumWebPurchases       2205 non-null   int64  
 12  NumCatalogPurchases   2205 non-null   int64  
 13  NumStorePurchases     2205 non-null   int64  
 14  NumWebVisitsMonth     2205 non-null   int64  
 15  AcceptedCmp3          2205 non-null   int64  
 16  AcceptedCmp4          2205 non-null   int64  
 17  AcceptedCmp5          2205 non-null   int64  
 18  AcceptedCmp1          2205 non-null   int64  
 19  AcceptedCmp2          2205 non-null   int64  
 20  Complain              2205 non-null   int64  
 21  Z_CostContact         2205 non-null   int64  
 22  Z_Revenue             2205 non-null   int64  
 23  Response              2205 non-null   int64  
 24  Age                   2205 non-null   int64  
 25  Customer_Days         2205 non-null   int64  
 26  marital_Divorced      2205 non-null   int64  
 27  marital_Married       2205 non-null   int64  
 28  marital_Single        2205 non-null   int64  
 29  marital_Together      2205 non-null   int64  
 30  marital_Widow         2205 non-null   int64  
 31  education_2n Cycle    2205 non-null   int64  
 32  education_Basic       2205 non-null   int64  
 33  education_Graduation  2205 non-null   int64  
 34  education_Master      2205 non-null   int64  
 35  education_PhD         2205 non-null   int64  
 36  MntTotal              2205 non-null   int64  
 37  MntRegularProds       2205 non-null   int64  
 38  AcceptedCmpOverall    2205 non-null   int64  
dtypes: float64(1), int64(38)
memory usage: 672.0 KB
# In this dataset we have 2205 rows and 39 columns
pd.set_option("display.max.rows", 2300)
pd.set_option("display.max.columns", 60)

Data Cleaning

# Checking the duplicates
food[food.duplicated()].sum()
Income                  9366769.0
Kidhome                      79.0
Teenhome                     87.0
Recency                    9277.0
MntWines                  55672.0
MntFruits                  4936.0
MntMeatProducts           28906.0
MntFishProducts            7256.0
MntSweetProducts           4708.0
MntGoldProds               8380.0
NumDealsPurchases           403.0
NumWebPurchases             725.0
NumCatalogPurchases         488.0
NumStorePurchases          1104.0
NumWebVisitsMonth           975.0
AcceptedCmp3                 12.0
AcceptedCmp4                  9.0
AcceptedCmp5                 15.0
AcceptedCmp1                  9.0
AcceptedCmp2                  4.0
Complain                      1.0
Z_CostContact               552.0
Z_Revenue                  2024.0
Response                     22.0
Age                        9358.0
Customer_Days            464574.0
marital_Divorced             16.0
marital_Married              69.0
marital_Single               34.0
marital_Together             59.0
marital_Widow                 6.0
education_2n Cycle           15.0
education_Basic               5.0
education_Graduation         98.0
education_Master             29.0
education_PhD                37.0
MntTotal                 101478.0
MntRegularProds           93098.0
AcceptedCmpOverall           49.0
dtype: float64
# Removing the duplicates
food.drop_duplicates(keep=False, inplace=True)

food.info()
## Output : 
<class 'pandas.core.frame.DataFrame'>
Index: 1843 entries, 0 to 2204
Data columns (total 39 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Income                1843 non-null   float64
 1   Kidhome               1843 non-null   int64  
 2   Teenhome              1843 non-null   int64  
 3   Recency               1843 non-null   int64  
 4   MntWines              1843 non-null   int64  
 5   MntFruits             1843 non-null   int64  
 6   MntMeatProducts       1843 non-null   int64  
 7   MntFishProducts       1843 non-null   int64  
 8   MntSweetProducts      1843 non-null   int64  
 9   MntGoldProds          1843 non-null   int64  
 10  NumDealsPurchases     1843 non-null   int64  
 11  NumWebPurchases       1843 non-null   int64  
 12  NumCatalogPurchases   1843 non-null   int64  
 13  NumStorePurchases     1843 non-null   int64  
 14  NumWebVisitsMonth     1843 non-null   int64  
 15  AcceptedCmp3          1843 non-null   int64  
 16  AcceptedCmp4          1843 non-null   int64  
 17  AcceptedCmp5          1843 non-null   int64  
 18  AcceptedCmp1          1843 non-null   int64  
 19  AcceptedCmp2          1843 non-null   int64  
 20  Complain              1843 non-null   int64  
 21  Z_CostContact         1843 non-null   int64  
 22  Z_Revenue             1843 non-null   int64  
 23  Response              1843 non-null   int64  
 24  Age                   1843 non-null   int64  
 25  Customer_Days         1843 non-null   int64  
 26  marital_Divorced      1843 non-null   int64  
 27  marital_Married       1843 non-null   int64  
 28  marital_Single        1843 non-null   int64  
 29  marital_Together      1843 non-null   int64  
 30  marital_Widow         1843 non-null   int64  
 31  education_2n Cycle    1843 non-null   int64  
 32  education_Basic       1843 non-null   int64  
 33  education_Graduation  1843 non-null   int64  
 34  education_Master      1843 non-null   int64  
 35  education_PhD         1843 non-null   int64  
 36  MntTotal              1843 non-null   int64  
 37  MntRegularProds       1843 non-null   int64  
 38  AcceptedCmpOverall    1843 non-null   int64  
dtypes: float64(1), int64(38)
memory usage: 575.9 KB

After removing the duplicates, we are left with 1843 rows and 39 columns.

# Some columns can be grouped in one because they have the same information
food["Total_Children"] = food[["Kidhome", "Teenhome"]].sum(axis=1)

food["marital_Divorced"] = food["marital_Divorced"].replace({1:5,0:0})
food["marital_Married"] = food["marital_Married"].replace({1:4,0:0})
food["marital_Single"] = food["marital_Single"].replace({1:3,0:0})
food["marital_Together"] = food["marital_Together"].replace({1:2,0:0})
food["marital_Widow"] = food["marital_Widow"].replace({1:1,0:0})

food["marital_Status"] = food[["marital_Divorced", "marital_Married", "marital_Single", "marital_Together", "marital_Widow"]].sum(axis=1)

# Mapping the marital columns into a new column.
food["marital_Status_str"] = food["marital_Status"].map({5 : "Divorced", 4 : "Married", 3 : "Single", 2 : "Together", 1 : "Widow"})

food.info()
## Output : 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 42 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Income                2205 non-null   float64
 1   Kidhome               2205 non-null   int64  
 2   Teenhome              2205 non-null   int64  
 3   Recency               2205 non-null   int64  
 4   MntWines              2205 non-null   int64  
 5   MntFruits             2205 non-null   int64  
 6   MntMeatProducts       2205 non-null   int64  
 7   MntFishProducts       2205 non-null   int64  
 8   MntSweetProducts      2205 non-null   int64  
 9   MntGoldProds          2205 non-null   int64  
 10  NumDealsPurchases     2205 non-null   int64  
 11  NumWebPurchases       2205 non-null   int64  
 12  NumCatalogPurchases   2205 non-null   int64  
 13  NumStorePurchases     2205 non-null   int64  
 14  NumWebVisitsMonth     2205 non-null   int64  
 15  AcceptedCmp3          2205 non-null   int64  
 16  AcceptedCmp4          2205 non-null   int64  
 17  AcceptedCmp5          2205 non-null   int64  
 18  AcceptedCmp1          2205 non-null   int64  
 19  AcceptedCmp2          2205 non-null   int64  
 20  Complain              2205 non-null   int64  
 21  Z_CostContact         2205 non-null   int64  
 22  Z_Revenue             2205 non-null   int64  
 23  Response              2205 non-null   int64  
 24  Age                   2205 non-null   int64  
 25  Customer_Days         2205 non-null   int64  
 26  marital_Divorced      2205 non-null   int64  
 27  marital_Married       2205 non-null   int64  
 28  marital_Single        2205 non-null   int64  
 29  marital_Together      2205 non-null   int64  
 30  marital_Widow         2205 non-null   int64  
 31  education_2n Cycle    2205 non-null   int64  
 32  education_Basic       2205 non-null   int64  
 33  education_Graduation  2205 non-null   int64  
 34  education_Master      2205 non-null   int64  
 35  education_PhD         2205 non-null   int64  
 36  MntTotal              2205 non-null   int64  
 37  MntRegularProds       2205 non-null   int64  
 38  AcceptedCmpOverall    2205 non-null   int64  
 39  Total_Children        2205 non-null   int64  
 40  marital_Status        2205 non-null   int64  
 41  marital_Status_str    2205 non-null   object 
dtypes: float64(1), int64(40), object(1)
memory usage: 723.6+ KB

We can performi the same operations on the education information.

food["education_PhD"] = food["education_PhD"].replace({1:5,0:0})
food["education_Master"] = food["education_Master"].replace({1:4,0:0})
food["education_Graduation"] = food["education_Graduation"].replace({1:3,0:0})
food["education_Basic"] = food["education_Basic"].replace({1:2,0:0})
food["education_2n Cycle"] = food["education_2n Cycle"].replace({1:1,0:0})

food["education_Status"] = food[["education_PhD", "education_Master", "education_Graduation", "education_Basic", "education_2n Cycle"]].sum(axis=1)

# Mapping the education columns into a new column
food["education_Status_str"] = food["education_Status"].map({5 : "PhD", 4 : "Master", 3 : "Graduation", 2 : "Basic", 1 : "2n Cycle"})

food["Accepted_Campaigns"] = food[["AcceptedCmp1", "AcceptedCmp2", "AcceptedCmp3", "AcceptedCmp4", "AcceptedCmp5"]].sum(axis=1)

food["Accepted_Campaigns"] = (food["Accepted_Campaigns"] != 0).astype(int)

food.corr(method="pearson", numeric_only=True)["Accepted_Campaigns"].sort_values(ascending=False)
## Output : 
Accepted_Campaigns      1.000000
AcceptedCmpOverall      0.859326
AcceptedCmp4            0.553623
AcceptedCmp3            0.551797
AcceptedCmp5            0.548133
AcceptedCmp1            0.512399
MntWines                0.463465
MntTotal                0.414528
MntRegularProds         0.413418
Response                0.367892
Income                  0.336105
NumCatalogPurchases     0.333796
MntMeatProducts         0.285121
AcceptedCmp2            0.229374
NumWebPurchases         0.210733
MntGoldProds            0.188876
NumStorePurchases       0.185159
MntFishProducts         0.159548
MntSweetProducts        0.158667
MntFruits               0.124605
education_Status        0.039992
education_PhD           0.038395
Age                     0.030203
marital_Widow           0.013568
marital_Married         0.008298
marital_Single         -0.000211
marital_Status         -0.000400
education_Master       -0.004837
marital_Divorced       -0.006486
education_Graduation   -0.007113
marital_Together       -0.010172
Customer_Days          -0.013342
education_2n Cycle     -0.016136
Recency                -0.016200
Complain               -0.025402
education_Basic        -0.037729
NumDealsPurchases      -0.087632
Teenhome               -0.100674
NumWebVisitsMonth      -0.130340
Kidhome                -0.203011
Total_Children         -0.218689
Z_CostContact                NaN
Z_Revenue                    NaN
Name: Accepted_Campaigns, dtype: float64

Data Visualization and Share

import seaborn as sns

all_correlations = food.corr(method="pearson", numeric_only=True)
all_correlations = all_correlations[(all_correlations > 0.3) & (all_correlations < 1)]

sns.heatmap(all_correlations)

For the next step, we will define ranges from 23 to 80 and assign ages within those ranges

age_groups = [(23,30), (31,40), (41,50), (51,60), (61,70), (71,80)]

def assign_age_group(Age):
    for age_range in age_groups:
        if age_range[0] <= Age <= age_range[1]:
            return f"{age_range[0]} - {age_range[1]}"
    return("Unknown")

food["Age_group"] = food["Age"].apply(assign_age_group)

food[["Age", "Age_group"]].head()
## Output :
Age	Age_group
0	63	61 - 70
1	66	61 - 70
2	55	51 - 60
3	36	31 - 40
4	39	31 - 40
food["Age_group"].value_counts()
## Output : 
Age_group
41 - 50    727
51 - 60    502
61 - 70    453
31 - 40    357
71 - 80    105
23 - 30     61
Name: count, dtype: int64
# let's visualize 

age_order = ["23 - 30","31 - 40","41 - 50","51 - 60","61 - 70","71 - 80"]

sns.pointplot(data = food, x = "Age_group", y = "Accepted_Campaigns", order = age_order)

There are many people in the age ranges from 31 to 70, so let's visualize these ranges by percentage.

counts = food["Age_group"].value_counts()

percentage = counts / food.shape[0]
percentage_food = percentage.reset_index()
percentage_food.columns = ["age_group", "percentage"]
percentage_food = percentage_food.sort_values("age_group")

percentage_food
## Output : 
	age_group	percentage
5	23 - 30	0.027664
3	31 - 40	0.161905
0	41 - 50	0.329705
1	51 - 60	0.227664
2	61 - 70	0.205442
4	71 - 80	0.047619
import matplotlib.pyplot as plt

sns.barplot(x = "age_group", y = "percentage", data = percentage_food)
plt.title("Percentage of Accepted Campaigns per Age Group")
plt.show()

Age segmentation - core audience for accepting campaigns is 31 - 70. We will now determine the total spending for this age range

food.groupby("Age_group")["MntTotal"].sum()
## Output : 
Age_group
23 - 30     45837
31 - 40    175176
41 - 50    350590
51 - 60    290520
61 - 70    285289
71 - 80     93484
Name: MntTotal, dtype: int64
grouped_food = food.groupby("Age_group")["MntTotal"].sum().reset_index()

sns.barplot(x = "Age_group", y = "MntTotal", data = grouped_food)
plt.title("Amount Spent Per Age Group")
plt.show()

As the previous result we see that the range from 31 to 70 spend a lot of money, We will now examine the amount spent by individuals who accepted the campaign

accepted_campaign = food[food["Accepted_Campaigns"] != 0]

grouped_food = accepted_campaign.groupby("Age_group")["MntTotal"].sum().reset_index()

sns.barplot(x = "Age_group", y = "MntTotal", data = grouped_food)
plt.title("Amount Spent Per Age Group")
plt.show()

Now we can observe that the percentage spent by individuals who accepted the campaign has increased, especially for the age range of 71 to 80 years old, but not for the range of 23 to 30.

Let's delve into the number of purchases

food[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum()
## Output :
NumWebPurchases         9042
NumCatalogPurchases     5833
NumStorePurchases      12841
dtype: int64
sum_food = pd.DataFrame(food[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum(), columns=["Sums"])
sum_food = sum_food.reset_index()
sum_food.rename(columns={"index":"Type_of_Purchase"}, inplace=True)

sum_food
## Output : 
Type_of_Purchase	Sums
0	NumWebPurchases	9042
1	NumCatalogPurchases	5833
2	NumStorePurchases	12841
# let's visualize

sns.barplot(x = "Type_of_Purchase", y="Sums", data = sum_food)

Let's visualize the number of purchases made by individuals who accepted the campaigns

accepted_campaign = food[food["Accepted_Campaigns"] != 0]
sum_food = pd.DataFrame(accepted_campaign[["NumWebPurchases", "NumCatalogPurchases", "NumStorePurchases"]].sum(), columns=["Sums"])
sum_food = sum_food.reset_index()
sum_food.rename(columns={"index":"Type_of_Purchase"}, inplace=True)
sns.barplot(x = "Type_of_Purchase", y="Sums", data = sum_food)
x = sns.jointplot(data = food, x = "MntTotal", y = "NumWebPurchases", kind = "kde")
x.plot_joint(sns.regplot, color = "r")
x = sns.jointplot(data = food, x = "MntTotal", y = "NumCatalogPurchases", kind = "kde")
x.plot_joint(sns.regplot, color = "r")
x = sns.jointplot(data = food, x = "MntTotal", y = "NumStorePurchases", kind = "kde")
x.plot_joint(sns.regplot, color = "r")

These visualizations demonstrate a correlation: the more individuals spend on the web, the more they tend to spend overall. Now, let's examine the distribution of the amount spent compared to whether individuals have children or not.

sns.regplot(x = "Total_Children", y="MntTotal", data = food)

As the number of children increases, individuals tend to spend less money.

sns.regplot(x = "Total_Children", y="Accepted_Campaigns", data = food)

As the number of children increases, individuals are less likely to accept the campaigns.

sns.regplot(x = "education_Status", y="Accepted_Campaigns", data = food)

As education levels increase, individuals are more likely to accept the campaign.

ns.regplot(x = "education_Status", y="MntTotal", data = food)

It appears that education level does not significantly influence whether individuals accept the campaign or spend more money. Let's analyze the distribution basing on the marital status

sns.countplot(x = "marital_Status_str", data = food)

Here, we have the number of people per marital status.

rel_food = food.groupby("marital_Status_str")["MntTotal"].sum().reset_index()

sns.barplot(x = "marital_Status_str", y="MntTotal", data = rel_food)

Married individuals are spending more money compared to others.

accepted_campaign = food[food["Accepted_Campaigns"] != 0]

rel_food = accepted_campaign.groupby("marital_Status_str")["MntTotal"].sum().reset_index()

sns.barplot(x = "marital_Status_str", y="MntTotal", data = rel_food)
total_count = food["marital_Status_str"].value_counts()
accepted = food[food["Accepted_Campaigns"] == 1]["marital_Status_str"].value_counts()
percentage_marital = accepted / total_count * 100
percentage_food = percentage_marital.reset_index()
percentage_food.columns = ["Marital_Status", "Percentage"]
percentage_food
## Output : 
Marital_Status	Percentage
0	Married	21.194379
1	Together	20.070423
2	Single	20.754717
3	Divorced	20.000000
4	Widow	23.684211
sns.barplot(x = "Marital_Status", y="Percentage", data = percentage_food)

As observed, individuals who accepted campaigns tend to spend more money. Therefore, "Married," "Single," and "Together" individuals spend more money compared to widowed and divorced individuals.

Overall Findings

  1. Age : 30-70 were spending more money, but less likely to accept campaigns. Higher volume here though.
  2. Catalog was more likely to accept campaigns, but in person spend more. Recommend a split between all.
  3. Focus on people with less kids
  4. Education has no impact - don't target any group
  5. Marital Status - doesn't play a big part. Married, Single, Together, spend more money

Recommendations to the Marketing department

  1. Money Making : Middle Aged, People, high earners, with no kids. Target on different platforms with split above.
  2. For New Users to Spend Money : Focus on 21-30, 70 and up- who statistically accepted campaigns at a higher rate.
  3. Since the age range from 31 to 70 is spending a significant amount of money, the marketing department can focus their efforts on this group and tailor their campaigns to encourage even more spending.
  4. The age ranges from 23 to 30 years old and 71 to 85 years old may be spending less, but they are more receptive to marketing campaigns. Therefore, the marketing department should target these groups with campaigns designed to increase their spending.
  5. In order to increase spending among individuals aged 23 to 30 who accept the campaign, they can adapt and focus the campaign specifically on this age range.
  6. If they want, the marketing department can enhance their efforts towards customers with higher percentage catalog purchases and then they should prioritize their focus on "In Store/Web" channels as they attract more traffic.
  7. "Married," "Single," and "Together" individuals spend more money compared to widowed and divorced individuals. Given this insight, the marketing department should focus on targeting this segment.

Steven OBAME