ํ๋ก์ ํธ ๊ฐ์ค ๋ถ์ ๊ณผ์
๋ด๋น ๊ฐ์ค: ์ ์ฌ ์ ํํ ์ ์ (์ค์ ๊ฐํ)
-mid_50k_150k + new + ๊ฒ์ ์ ์
+ ๋ฆฌ๋ทฐ ํด๋ฆญ X + ํ ์ธ ๋
ธ์ถ O
# mid_50k_150k + new + ๊ฒ์ ์ ์
+ ๋ฆฌ๋ทฐ ํด๋ฆญ X + ํ ์ธ ๋
ธ์ถ O ๋ง ๋ณด๊ธฐ
import pandas as pd
df5 = pd.read_csv("df5.csv")
filtered_df = df5[
(df5['price_band'] == 'mid_50k_150k') &
(df5['user_type'] == 'new') &
(df5['traffic_source'] == 'search') &
(df5['review_clicked'] == False) &
(df5['discount_exposed'] == True)
]
filtered_df
# ์ํ ํ์
๋ณ ์ดํ๋ฅ ๋ณด๊ธฐ
# 'churned' ํ๋๊ทธ ์์ฑ: add_to_cart == 'No' ์ธ ๊ฒฝ์ฐ True
filtered_df = filtered_df.copy()
filtered_df['churned'] = filtered_df['add_to_cart'] == 'No'
# product_category๋ณ ์ดํ๋ฅ ๊ณ์ฐ
churn_summary = (
filtered_df
.groupby('product_category')
.agg(
total_users=('user_id', 'count'),
churned_users=('churned', 'sum')
)
.assign(dropout_rate=lambda x: (x['churned_users'] / x['total_users']) * 100)
.reset_index()
)
# ๊ฒฐ๊ณผ ์ถ๋ ฅ
print(churn_summary)
Python
๋ณต์ฌ
product_category | total_users | churned_users | dropout_rate |
accessory | 6 | 4 | 66% |
bag | 6 | 4 | 66% |
fashion | 8 | 5 | 62% |
outlet | 10 | 10 | 100% |
-์์ธ๋ ์ดํ 100%
๊ฒ์์ผ๋ก ์ ์
๋์ผ๋ฉฐ, ํ ์ธ๋ฅ ์ ๋ด.
์ค์ ๊ฐ ์ ํ์ ์ธ๊ฒ ์ด ๊ธฐํ์ด์ง๋ง, ์ฅ๋ฐ๊ตฌ๋์ ๋ด์ง ์์
์ ๋งคํ ํ ์ธ์ ์คํ๋ ค ๊ตฌ๋งค ์์ธ์ด ๋จ์ด์ง๋ค.
(์ค๋ณต ํ ์ธ ์ํ์ง๋ง ๊ตฌ๋งคํ์ง ์์์)
# pdp
# 1. ๋ถ์ ํ๊ฒ ํํฐ๋ง
target_users = df5[
(df5['price_band'] == 'mid_50k_150k') &
(df5['user_type'] == 'new') &
(df5['traffic_source'] == 'search') &
(df5['review_clicked'] == False) &
(df5['discount_exposed'] == True)
]
# 2. ์ฒด๋ฅ์๊ฐ + ์ํ ์นดํ
๊ณ ๋ฆฌ๋ณ ๋ถ์
summary = target_users.groupby(['duration_group_label', 'product_category']).agg(
total_users=('user_id', 'count'),
avg_pdp_duration=('pdp_duration_sec', 'mean'),
churned_users=('add_to_cart', lambda x: (x == 'No').sum()),
converted_users=('purchase_completed', lambda x: (x == 'Yes').sum())
).reset_index()
# 3. ๋น์จ ๊ณ์ฐ
summary['dropout_rate'] = (summary['churned_users'] / summary['total_users']) * 100
summary['conversion_rate'] = (summary['converted_users'] / summary['total_users']) * 100
# 4. ๋ฐ์ฌ๋ฆผ
summary['avg_pdp_duration'] = summary['avg_pdp_duration'].round(1)
summary['dropout_rate'] = summary['dropout_rate'].round(1)
summary['conversion_rate'] = summary['conversion_rate'].round(1)
summary
Python
๋ณต์ฌ
์ฒด๋ฅ์๊ฐ ๊ตฌ๊ฐ | ์ํ | ์ดํ๋ฅ | ์ ํ์จ |
๋งค์ฐ ์งง์ | fashion | 0% | 0% |
๋งค์ฐ ์งง์ | accessory | 50% | 0% |
๋งค์ฐ ์งง์ | outlet | 100% | 0% |
์งง์ | fashion | 60% | 20% |
์งง์ | accessory | 0% | 0% |
์งง์ | outlet | 100% | 0% |
์งง์ | bag | 50% | 0% |
๊น | fashion | 100.0% | 0% |
๊น | accessory | 100% | 0% |
๊น | outlet | 100% | 0% |
๊น | bag | 66.7% | 0% |
๋งค์ฐ ๊น | fashion | 100% | 0% |
๋งค์ฐ ๊น | accessory | 100% | 0% |
๋งค์ฐ ๊น | bag | 100% | 0% |
์ฒด๋ฅ ์๊ฐ ๊ด๋ จ ์๊ฐํ
๋ด๋น ๊ฐ์ค: ๋ฌด๋ฐ์ํ ์ ๊ท ์ ์ -
โข
(under_50k + mid_50k_150k) + new + ad ์ ์
+ ๋ฆฌ๋ทฐ ํด๋ฆญ X + ํ ์ธ ๋
ธ์ถ x
์นดํ
๊ณ ๋ฆฌ ๋ณ ์ดํ๋ฅ
#์ ์ ํํฐ๋ง
import pandas as pd
# CSV ํ์ผ ๋ถ๋ฌ์ค๊ธฐ
df = pd.read_csv("df.csv")
# ์กฐ๊ฑด ํํฐ๋ง
df_filtered = df[
(df['price_band'].isin(['under_50k', 'mid_50k_150k'])) &
(df['user_type'] == 'new') &
(df['traffic_source'] == 'ad') &
(df['review_clicked'] == False) &
(df['discount_exposed'] == False)
]
print("์ ์ฒด ์ ์ ์:", df.shape[0])
# ์นดํ
๊ณ ๋ฆฌ ๋ณ ์ดํ๋ฅ ์ ์ฉ
# 1. ์ดํ ์ฌ๋ถ ํ๋๊ทธ ์์ฑ
df_filtered['is_abandon'] = df_filtered['add_to_cart'] == "No"
# 2. product_category๋ณ ์ด ์ ์ ์์ ์ดํ ์ ์ ์ ์ง๊ณ
abandon_summary = (
df_filtered
.groupby('product_category')
.agg(
total_users=('add_to_cart', 'count'),
abandon_users=('is_abandon', 'sum')
)
.reset_index()
)
# 3. ์ดํ๋ฅ ๊ณ์ฐ (%)
abandon_summary['abandon_rate'] = (abandon_summary['abandon_users'] / abandon_summary['total_users']) * 100
# ๊ฒฐ๊ณผ ์ถ๋ ฅ
abandon_summary
Python
๋ณต์ฌ
์๊ฐํ
์์ ๊ด๊ณ ์ ์
..
โoutletโ์ ์ ์ธํ๋ฉด ์ ๋ถ ์ดํํจ = ๊ด๊ณ ๋ง๊ณ ๋ค๋ฅธ ์ถ๊ฐ ์ก์
์ด ํ์..
โข
๊ทธ๋๋ง ๊ฐ๊ฒฉ ๋ฉ๋ฆฌํธ๊ฐ ์๋ ์์ธ๋ ๋ง ์กฐ๊ธ ์ฅ๋ฐ๊ตฌ๋์ ์ฅ์ฅ..
๋ด๋น ๊ฐ์ค: ๋ฆฌ๋ทฐ ๋ฐ์ํ ์ ๊ท ์ ์
(under_50k + mid_50k_150k) + new + ad ์ ์
+ ๋ฆฌ๋ทฐ ํด๋ฆญ ใ
+ ํ ์ธ ๋
ธ์ถ x
์๊ฐํ
๊ด๊ณ +๋ฆฌ๋ทฐ ์ ๊ท ์ ์
โข
โfashionโ ์ฌ ์ดํ.. ๊ด๊ณ ๋ณด๊ณ ์์ง๋ง ๋ฆฌ๋ทฐ ๋ณด๊ณ ํํด






