์ž‘์—… 1์œ ํ˜•

Contents

์ž‘์—… 1์œ ํ˜•#

Hits

Click here

9ํšŒ ๋น…๋ถ„๊ธฐ ์‹ค๊ธฐ ๋Œ€๋น„ ๊ฐ•์˜, ๋ธ”๋กœ๊ทธ๋งŒ์œผ๋กœ๋Š” ๋„์ €ํžˆ ์•ˆ๋˜๊ฒ ๋‹คใ…  ํ•˜์‹œ๋Š”๋ถ„๋“ค์—๊ฒŒ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค.

๋ณธ๊ฒฉ์ ์œผ๋กœ ์‹œ์ž‘ํ•˜์‹œ๊ธฐ ์ „์— ์›น๊ด‘๊ณ  ํ•œ๋ฒˆ ๋ˆŒ๋Ÿฌ์ฃผ์„ธ์š”
๋ฌธ์ œ ํ’€์ด ์˜์ƒ : https://www.youtube.com/watch?v=hzHPpybySJE
  • ๋น…๋ถ„๊ธฐ, adp ์ •๋ณด๊ณต์œ  ์˜คํ”ˆ์นดํ†ก๋ฐฉ1
    @@@์ฐธ์—ฌ ๋งํฌ@@@
    ์ฐธ์—ฌ ์ฝ”๋“œ : dbscan (์ˆ˜์‹œ ์—…๋ฐ์ดํŠธ, ์นดํ†ก ํ™”๋ฉด ํ•˜๋‹จ์— ๋ฌธ์ œ ํ™•์ธ ํ•ด์ฃผ์„ธ์š”)

  • ๋น…๋ถ„๊ธฐ, adp ์ •๋ณด๊ณต์œ  ์˜คํ”ˆ์นดํ†ก๋ฐฉ2
    @@@์ฐธ์—ฌ ๋งํฌ@@@

๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ#

์œ ํŠœ๋ธŒ ์ธ๊ธฐ๋™์˜์ƒ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/rsrishav/youtube-trending-video-dataset?select=KR_youtube_trending_data.csv
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์œ ํŠœ๋ธŒ ๋ฐ์ผ๋ฆฌ ์ธ๊ธฐ๋™์˜์ƒ (ํ•œ๊ตญ)
dataurl : https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/youtube.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/youtube.csv",index_col=0)
df.head()
title channelTitle categoryId view_count likes dislikes comment_count channelId trending_date2
0 [์‹ ๋ณ‘] ๋ฌผ์ž์ฐฝ๊ณ  ์žฅ์‚์ญˆ 23 1893473 38249 730 8595 UChbE5OZQ6dRHECsX0tEPEZQ 2021-01-01
1 RAIN(๋น„) - ๋‚˜๋กœ ๋ฐ”๊พธ์ž Switch to me (duet with JYP) MV RAIN's Official Channel 10 2600864 0 0 20129 UCxXgIeE5hxWxHG6dz9Scg2w 2021-01-01
2 2020๋…„ ์ œ์•ผ์˜ ์ข… ์˜จ๋ผ์ธ ํƒ€์ข…ํ–‰์‚ฌ | ๋ณด์‹ ๊ฐ ํ˜„์žฅ ํ–‰์‚ฌ๋Š” ์ง„ํ–‰ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์„œ์šธ์‹œ ยท Seoul 29 347049 3564 120 178 UCZUPZW5idAxYp-Asj__lVAA 2021-01-01
3 ๊ณ ๊ธฐ๋‚จ์ž์˜ ์น ๋ฉด์กฐ ํŒŒํ‹ฐ ๊ณ ๊ธฐ๋‚จ์ž MeatMan 26 528458 15372 280 3470 UCT3CumbFIJiW33uq0UI3zlg 2021-01-01
4 ๊ณจ๋ชฉ 3mc๋ฅผ ๋ถ„๋…ธํ•˜๊ฒŒ ๋งŒ๋“  ๋งˆ์Œ๊ณ ์ƒ์ด ์‹ฌํ–ˆ์„ ๊ณต๋ฆ‰ ๋ฐฑ๋ฐ˜์ง‘ ์‚ฌ์žฅ๋‹˜์˜ ํ‘ธ๋…?! [์˜ˆ๋Šฅ... ์Šค๋ธŒ์Šค๋ฐฅ์ง‘ 24 494904 3918 111 3142 UCdWgRSfttvDucq4ApcCg5Mw 2021-01-01

Question 1

์ธ๊ธฐ๋™์˜์ƒ ์ œ์ž‘ํšŸ์ˆ˜๊ฐ€ ๋งŽ์€ ์ฑ„๋„ ์ƒ์œ„ 10๊ฐœ๋ช…์„ ์ถœ๋ ฅํ•˜๋ผ (๋‚ ์งœ๊ธฐ์ค€, ์ค‘๋ณตํฌํ•จ)

Hide code cell source
answer =list(df.loc[df.channelId.isin(df.channelId.value_counts().head(10).index)].channelTitle.unique())
print(answer)
['์žฅ์‚์ญˆ', '์ด๋ช‡๋ช…', 'ํŒŒ๋ฟŒ๋ฆฌ', '์งคํˆฐ', '๋Ÿฐ๋‹๋งจ - ์Šค๋ธŒ์Šค ๊ณต์‹ ์ฑ„๋„', '์— ๋šœ๋ฃจ๋งˆ๋šœ๋ฃจ : MBC ๊ณต์‹ ์ข…ํ•ฉ ์ฑ„๋„', 'SPOTV', '์ฑ„๋„ ์‹ญ์˜ค์•ผ', '์ด๊ณผ์žฅ', 'BANGTANTV']

Question 2

๋…ผ๋ž€์œผ๋กœ ์ธ๊ธฐ๋™์˜์ƒ์ด ๋œ ์ผ€์ด์Šค๋ฅผ ํ™•์ธํ•˜๊ณ  ์‹ถ๋‹ค. dislikes์ˆ˜๊ฐ€ like ์ˆ˜๋ณด๋‹ค ๋†’์€ ๋™์˜์ƒ์„ ์ œ์ž‘ํ•œ ์ฑ„๋„์„ ๋ชจ๋‘ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
answer =list(df.loc[df.likes < df.dislikes].channelTitle.unique())
print(answer)
['ํ•ซ๋„๊ทธTV', 'ASMR ์• ์ •TV', 'ํ•˜์–€ํŠธ๋ฆฌHayanTree', '์–‘ํŒก YangPang', '์ฒ ๊ตฌํ˜• (CHULTUBE)', '์™œ๋ƒ๋งจํ•˜์šฐ์Šค', '(MUTUBE)์™€๊พธ๋Œ€์žฅ๋ด‰์ค€', '์˜ค๋ฉ”ํ‚ดTV', '์œก์ง€๋‹ด', 'MapleStory_KR', 'ROAD FIGHTING CHAMPIONSHIP', '์‚ฌ๋‚˜์ด ๊น€๊ธฐํ›ˆ', '๋‚˜ํ˜ผ์ž์‚ฐ๋‹ค STUDIO', 'Gen.G esports']

Question 3

์ฑ„๋„๋ช…์„ ๋ฐ”๊พผ ์ผ€์ด์Šค๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ์‹ถ๋‹ค. channelId์˜ ๊ฒฝ์šฐ ๊ณ ์œ ๊ฐ’์ด๋ฏ€๋กœ ์ด๋ฅผ ํ†ตํ•ด ์ฑ„๋„๋ช…์„ ํ•œ๋ฒˆ์ด๋ผ๋„ ๋ฐ”๊พผ ์ฑ„๋„์˜ ๊ฐฏ์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
change = df[['channelTitle','channelId']].drop_duplicates().channelId.value_counts()
target = change[change>1]
print(len(target))
71

Question 4

์ผ์š”์ผ์— ์ธ๊ธฐ์žˆ์—ˆ๋˜ ์˜์ƒ๋“ค์ค‘ ๊ฐ€์žฅ๋งŽ์€ ์˜์ƒ ์ข…๋ฅ˜(categoryId)๋Š” ๋ฌด์—‡์ธ๊ฐ€?

Hide code cell source
df['trending_date2'] = pd.to_datetime(df['trending_date2'])
answer =df.loc[df['trending_date2'].dt.day_name() =='Sunday'].categoryId.value_counts().index[0]
print(answer)
24

Question 5

๊ฐ ์š”์ผ๋ณ„ ์ธ๊ธฐ ์˜์ƒ๋“ค์˜ categoryId๋Š” ๊ฐ๊ฐ ๋ช‡๊ฐœ ์”ฉ์ธ์ง€ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œํ˜„ํ•˜๋ผ

Hide code cell source
group = df.groupby([df['trending_date2'].dt.day_name(),'categoryId'],as_index=False).size()
answer= group.pivot(index='categoryId',columns='trending_date2')
display(answer)
size
trending_date2 Friday Monday Saturday Sunday Thursday Tuesday Wednesday
categoryId
1 243 263 255 274 246 257 234
2 120 105 119 99 128 119 129
10 833 837 776 830 890 894 917
15 187 215 198 217 207 208 207
17 633 668 592 636 682 708 706
19 90 92 87 91 92 89 85
20 283 298 296 289 282 285 291
22 1288 1373 1289 1337 1341 1375 1333
23 568 594 570 556 560 569 566
24 2976 3148 3066 3096 2954 3084 3090
25 444 453 422 437 470 452 468
26 369 378 364 363 375 394 385
27 183 205 183 199 194 194 212
28 171 160 173 167 166 161 165
29 12 10 10 9 13 11 12

Question 6

๋Œ“๊ธ€์˜ ์ˆ˜๋กœ (comment_count) ์˜์ƒ ๋ฐ˜์‘์— ๋Œ€ํ•œ ํŒ๋‹จ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. viewcount๋Œ€๋น„ ๋Œ“๊ธ€์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ์˜์ƒ์„ ํ™•์ธํ•˜๋ผ (view_count๊ฐ’์ด 0์ธ ๊ฒฝ์šฐ๋Š” ์ œ์™ธํ•œ๋‹ค)

Hide code cell source
target2= df.loc[df.view_count!=0]
t = target2.copy()
t['ratio'] = (target2['comment_count']/target2['view_count']).dropna()
result = t.sort_values(by='ratio', ascending=False).iloc[0].title
print(result)
60๋ถ„ ๋™์•ˆ ๋Œ“๊ธ€์ด ๋‹ฌ๋ฆฌ์ง€ ์•Š์œผ๋ฉด, ์˜์ƒ์ด ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค. (์ฑŒ๋ฆฐ์ง€)

Question 7

๋Œ“๊ธ€์˜ ์ˆ˜๋กœ (comment_count) ์˜์ƒ ๋ฐ˜์‘์— ๋Œ€ํ•œ ํŒ๋‹จ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.viewcount๋Œ€๋น„ ๋Œ“๊ธ€์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ์˜์ƒ์„ ํ™•์ธํ•˜๋ผ (view_counts, ratio๊ฐ’์ด 0์ธ๊ฒฝ์šฐ๋Š” ์ œ์™ธํ•œ๋‹ค.)

Hide code cell source
ratio = (df['comment_count'] / df['view_count']).dropna().sort_values()
ratio[ratio!=0].index[0]

result= df.iloc[ratio[ratio!=0].index[0]].title
print(result)
Join the BTS #PermissiontoDance Challenge only on YouTube #Shorts

Question 8

like ๋Œ€๋น„ dislike์˜ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ ์€ ์˜์ƒ์€ ๋ฌด์—‡์ธ๊ฐ€? (like, dislike ๊ฐ’์ด 0์ธ๊ฒฝ์šฐ๋Š” ์ œ์™ธํ•œ๋‹ค)

Hide code cell source
target = df.loc[(df.likes !=0) & (df.dislikes !=0)]
num = (target['dislikes']/target['likes']).sort_values().index[0]

answer = df.iloc[num].title
print(answer)
[์คŒํ„ฐ๋ทฐ] *์ตœ์ดˆ๊ณต๊ฐœ* ์‚ฌ๋ถ€์ž‘์ฆˆ๐Ÿฐ๐Ÿถ์˜ ๋น„๊ณต์‹ ์ด๋ฆ„์€ ์•„์ด๋ผ์ธ์ฆˆ? ๊ฟ€์กฐํ•ฉ ํ‹ฐํ‚คํƒ€์นด ๊ฐ€๋“ํ•œ NCT 127 ๋„์˜&์ •์šฐ์˜ ์คŒํ„ฐ๋ทฐ

Question 9

๊ฐ€์žฅ๋งŽ์€ ํŠธ๋ Œ๋“œ ์˜์ƒ์„ ์ œ์ž‘ํ•œ ์ฑ„๋„์˜ ์ด๋ฆ„์€ ๋ฌด์—‡์ธ๊ฐ€? (๋‚ ์งœ๊ธฐ์ค€, ์ค‘๋ณตํฌํ•จ)

Hide code cell source
answer = df.loc[df.channelId ==df.channelId.value_counts().index[0]].channelTitle.unique()[0]
print(answer)
์งคํˆฐ

Question 10

20ํšŒ(20์ผ)์ด์ƒ ์ธ๊ธฐ๋™์˜์ƒ ๋ฆฌ์ŠคํŠธ์— ํฌํ•จ๋œ ๋™์˜์ƒ์˜ ์ˆซ์ž๋Š”?

Hide code cell source
answer= (df[['title','channelId']].value_counts()>=20).sum()
print(answer)
40

์œ ํŠœ๋ธŒ ๊ณต๋ฒ”์ปจํ…์ธ  ๋™์˜์ƒ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/kukuroo3/youtube-episodic-contents-kr(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์œ ํŠœ๋ธŒ โ€œ๊ณต๋ฒ”โ€ ์ปจํ…์ธ  ๋™์˜์ƒ ์ •๋ณด ( 10๋ถ„ ๊ฐ„๊ฒฉ ์ˆ˜์ง‘)
dataurl1 (๋น„๋””์˜ค ์ •๋ณด) = https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/videoInfo.csv
dataurl2 (์ฐธ๊ฐ€์ž ์ฑ„๋„ ์ •๋ณด)= https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/channelInfo.csv

import pandas as pd


channel =pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/channelInfo.csv')
video =pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/youtube/videoInfo.csv')
display(channel.head())
display(video.head())
channelid subcnt viewcnt videocnt ct channelname
0 UCkQCwnkQfgSuPTTnw_Y7v7w 1310000 410238653 736 2021-09-30 03:01:03 ๊ฝˆ๋šœ๋ฃน
1 UCkQCwnkQfgSuPTTnw_Y7v7w 1310000 412531322 736 2021-09-30 09:01:03 ๊ฝˆ๋šœ๋ฃน
2 UCkQCwnkQfgSuPTTnw_Y7v7w 1310000 412531322 735 2021-09-30 15:01:03 ๊ฝˆ๋šœ๋ฃน
3 UCkQCwnkQfgSuPTTnw_Y7v7w 1310000 412531322 737 2021-09-30 21:01:03 ๊ฝˆ๋šœ๋ฃน
4 UCkQCwnkQfgSuPTTnw_Y7v7w 1320000 412531322 737 2021-10-01 03:01:04 ๊ฝˆ๋šœ๋ฃน
videopk viewcnt likecnt dislikecnt favoritecnt cmcnt ct videoname
0 c5JQp6xafqc 1667010 30474 706 0 6587 2021-10-10 15:20:03 ๊ณต๋ฒ” EP1
1 c5JQp6xafqc 1669089 30495 707 0 6589 2021-10-10 15:30:03 ๊ณต๋ฒ” EP1
2 c5JQp6xafqc 1674759 30522 711 0 6596 2021-10-10 15:40:02 ๊ณต๋ฒ” EP1
3 c5JQp6xafqc 1677026 30555 712 0 6604 2021-10-10 15:50:03 ๊ณต๋ฒ” EP1
4 c5JQp6xafqc 1681824 30585 713 0 6600 2021-10-10 16:00:03 ๊ณต๋ฒ” EP1

Question 11

๊ฐ ๋ฐ์ดํ„ฐ์˜ โ€˜ctโ€™์ปฌ๋Ÿผ์„ ์‹œ๊ฐ„์œผ๋กœ ์ธ์‹ํ• ์ˆ˜ ์žˆ๊ฒŒ datatype์„ ๋ณ€๊ฒฝํ•˜๊ณ  video ๋ฐ์ดํ„ฐ์˜ videoname์˜ ๊ฐ value ๋งˆ๋‹ค ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ์”ฉ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋ผ

Hide code cell source
video['ct'] = pd.to_datetime(video['ct'])
answer = video.videoname.value_counts()
print(answer)
 ๊ณต๋ฒ” EP1    3492
 ๊ณต๋ฒ” EP2    3204
 ๊ณต๋ฒ” EP3    2568
 ๊ณต๋ฒ” EP4    2280
 ๊ณต๋ฒ” EP5    1562
 ๊ณต๋ฒ” EP6    1274
 ๊ณต๋ฒ” EP7     555
 ๊ณต๋ฒ” EP8     266
Name: videoname, dtype: int64

Question 12

์ˆ˜์ง‘๋œ ๊ฐ video์˜ ๊ฐ€์žฅ ์ตœ์‹ ํ™” ๋œ ๋‚ ์งœ์˜ viewcount๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
answer = video.sort_values(['videoname','ct']).drop_duplicates('videoname',keep='last')[['viewcnt','videoname','ct']].reset_index(drop=True)
display(answer)
viewcnt videoname ct
0 3180532 ๊ณต๋ฒ” EP1 2021-11-01 15:30:03
1 2199328 ๊ณต๋ฒ” EP2 2021-11-01 15:30:03
2 1671294 ๊ณต๋ฒ” EP3 2021-11-01 15:30:03
3 1818493 ๊ณต๋ฒ” EP4 2021-11-01 15:30:03
4 1503435 ๊ณต๋ฒ” EP5 2021-11-01 15:30:04
5 1750222 ๊ณต๋ฒ” EP6 2021-11-01 15:30:04
6 1630200 ๊ณต๋ฒ” EP7 2021-11-01 15:30:05
7 1289088 ๊ณต๋ฒ” EP8 2021-11-01 15:30:05

Question 13

Channel ๋ฐ์ดํ„ฐ์ค‘ 2021-10-03์ผ ์ดํ›„ ๊ฐ ์ฑ„๋„์˜ ์ฒ˜์Œ ๊ธฐ๋ก ๋๋˜ ๊ตฌ๋…์ž ์ˆ˜(subcnt)๋ฅผ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
channel.ct = pd.to_datetime(channel.ct)
target = channel[channel.ct >= pd.to_datetime('2021-10-03')].sort_values(['ct','channelname']).drop_duplicates('channelname')
answer = target[['channelname','subcnt']].reset_index(drop=True)
print(answer)
        channelname   subcnt
0            ๋…ผ๋ฆฌ์™• ์ „๊ธฐ   922000
1             ์•ผ์ „์‚ฝ์งฑ์žฌ   257000
2        ์ธ„์ •ChuJeong   322000
3        ๊น€๋†๋ฐ€์˜ ๋†๋ฐ€ํ•œ ์‚ถ     7520
4       ๋ฆดํŽ„ Lilpearl    10100
5   ์™€๊ธ€์™€๊ธ€ WagleWagle    55000
6               ์กฐ๋‚˜๋‹จ   215000
7           ํ˜•์‚ฌ!ํƒ์ •๋˜๋‹ค    14900
8     Balming Tiger    54300
9     ๊ณฝํ† ๋ฆฌ kwak tori   471000
10              ๊ฝˆ๋šœ๋ฃน  1330000

Question 14

๊ฐ์ฑ„๋„์˜ 2021-10-03 03:00:00 ~ 2021-11-01 15:00:00 ๊นŒ์ง€ ๊ตฌ๋…์ž์ˆ˜ (subcnt) ์˜ ์ฆ๊ฐ€๋Ÿ‰์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
end = channel.loc[channel.ct.dt.strftime('%Y-%m-%d %H') =='2021-11-01 15']
start = channel.loc[channel.ct.dt.strftime('%Y-%m-%d %H') =='2021-10-03 03']

end_df = end[['channelname','subcnt']].reset_index(drop=True)
start_df = start[['channelname','subcnt']].reset_index(drop=True)

end_df.columns = ['channelname','end_sub']
start_df.columns = ['channelname','start_sub']


tt = pd.merge(start_df,end_df)
tt['del'] = tt['end_sub'] - tt['start_sub']
result = tt[['channelname','del']]
display(result)
channelname del
0 ๊ฝˆ๋šœ๋ฃน 70000
1 ์•ผ์ „์‚ฝ์งฑ์žฌ 11000
2 ์ธ„์ •ChuJeong 1000
3 ๋…ผ๋ฆฌ์™• ์ „๊ธฐ -11000
4 ์™€๊ธ€์™€๊ธ€ WagleWagle 0
5 ์กฐ๋‚˜๋‹จ 12000
6 ํ˜•์‚ฌ!ํƒ์ •๋˜๋‹ค 10300
7 ๊น€๋†๋ฐ€์˜ ๋†๋ฐ€ํ•œ ์‚ถ 1540
8 ๋ฆดํŽ„ Lilpearl 11000
9 ๊ณฝํ† ๋ฆฌ kwak tori -2000
10 Balming Tiger 2500

Question 15

๊ฐ ๋น„๋””์˜ค๋Š” 10๋ถ„ ๊ฐ„๊ฒฉ์œผ๋กœ ๊ตฌ๋…์ž์ˆ˜, ์ข‹์•„์š”, ์‹ซ์–ด์š”์ˆ˜, ๋Œ“๊ธ€์ˆ˜๊ฐ€ ์ˆ˜์ง‘๋œ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๊ณต๋ฒ” EP1์˜ ๋น„๋””์˜ค์ •๋ณด ๋ฐ์ดํ„ฐ์ค‘ ์ˆ˜์ง‘๊ฐ„๊ฒฉ์ด 5๋ถ„ ์ดํ•˜, 20๋ถ„์ด์ƒ์ธ ๋ฐ์ดํ„ฐ ๊ตฌ๊ฐ„( ํ•ด๋‹น ์‹œ์  ์ „,ํ›„) ์˜ ์‹œ๊ฐ์„ ๋ชจ๋‘ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
import datetime

ep_one = video.loc[video.videoname.str.contains('1')].sort_values('ct').reset_index(drop=True)

ep_one[
        (ep_one.ct.diff(1) >=datetime.timedelta(minutes=20)) | \
        (ep_one.ct.diff(1) <=datetime.timedelta(minutes=5))
      
      ]

answer = ep_one[ep_one.index.isin([720,721,722,723,1635,1636,1637])]
display(answer)
videopk viewcnt likecnt dislikecnt favoritecnt cmcnt ct videoname
720 c5JQp6xafqc 2153676 34104 830 0 7120 2021-10-12 19:20:03 ๊ณต๋ฒ” EP1
721 c5JQp6xafqc 2228250 34559 849 0 7191 2021-10-13 09:41:37 ๊ณต๋ฒ” EP1
722 c5JQp6xafqc 2228250 34559 849 0 7191 2021-10-13 09:41:37 ๊ณต๋ฒ” EP1
723 c5JQp6xafqc 2228669 34566 849 0 7194 2021-10-13 09:50:03 ๊ณต๋ฒ” EP1
1635 c5JQp6xafqc 2706691 37893 987 0 7410 2021-10-19 17:50:02 ๊ณต๋ฒ” EP1
1636 c5JQp6xafqc 2707933 37901 988 0 7410 2021-10-19 18:20:03 ๊ณต๋ฒ” EP1
1637 c5JQp6xafqc 2708448 37903 988 0 7410 2021-10-19 18:30:03 ๊ณต๋ฒ” EP1

Question 16

๊ฐ ์—ํ”ผ์†Œ๋“œ์˜ ์‹œ์ž‘๋‚ ์งœ(๋…„-์›”-์ผ)๋ฅผ ์—ํ”ผ์†Œ๋“œ ์ด๋ฆ„๊ณผ ๋ฌถ์–ด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“ค๊ณ  ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
start_date = video.sort_values(['ct','videoname']).drop_duplicates('videoname')[['ct','videoname']]
start_date['date'] = start_date.ct.dt.date
answer = start_date[['date','videoname']]
display(answer)
date videoname
10 2021-10-07 ๊ณต๋ฒ” EP1
3496 2021-10-09 ๊ณต๋ฒ” EP2
6696 2021-10-14 ๊ณต๋ฒ” EP3
9264 2021-10-16 ๊ณต๋ฒ” EP4
11544 2021-10-21 ๊ณต๋ฒ” EP5
13661 2021-10-23 ๊ณต๋ฒ” EP6
13106 2021-10-28 ๊ณต๋ฒ” EP7
14935 2021-10-30 ๊ณต๋ฒ” EP8

Question 17

โ€œ๊ณต๋ฒ”โ€ ์ปจํ…์ธ ์˜ ๊ฒฝ์šฐ 19:00์‹œ์— ๊ณต๊ฐœ ๋˜๋Š”๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ์žˆ๋‹ค. ๊ณต๊ฐœ๋œ ๋‚ ์˜ 21์‹œ์˜ viewcnt, ct, videoname ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ viewcnt๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
video['time']= video.ct.dt.hour

answer = video.loc[video['time'] ==21] \
            .sort_values(['videoname','ct'])\
            .drop_duplicates('videoname') \
            .sort_values('viewcnt',ascending=False)[['videoname','viewcnt','ct']]\
            .reset_index(drop=True)

display(answer)
videoname viewcnt ct
0 ๊ณต๋ฒ” EP8 264029 2021-10-30 21:00:08
1 ๊ณต๋ฒ” EP7 252032 2021-10-28 21:00:04
2 ๊ณต๋ฒ” EP4 217674 2021-10-16 21:00:04
3 ๊ณต๋ฒ” EP6 213899 2021-10-23 21:00:06
4 ๊ณต๋ฒ” EP5 201179 2021-10-21 21:00:04
5 ๊ณต๋ฒ” EP2 148144 2021-10-09 21:00:03
6 ๊ณต๋ฒ” EP3 147183 2021-10-14 21:00:04
7 ๊ณต๋ฒ” EP1 117340 2021-10-07 21:00:03

Question 18

video ์ •๋ณด์˜ ๊ฐ€์žฅ ์ตœ๊ทผ ๋ฐ์ดํ„ฐ๋“ค์—์„œ ๊ฐ ์—ํ”ผ์†Œ๋“œ์˜ ์‹ซ์–ด์š”/์ข‹์•„์š” ๋น„์œจ์„ ratio ์ปฌ๋Ÿผ์œผ๋กœ ๋งŒ๋“ค๊ณ  videoname, ratio๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ratio๋ฅผ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋ผ

Hide code cell source
target = video.sort_values('ct').drop_duplicates('videoname',keep='last')
target['ratio'] =target['dislikecnt'] / target['likecnt']

answer = target.sort_values('ratio')[['videoname','ratio']].reset_index(drop=True)
answer
videoname ratio
0 ๊ณต๋ฒ” EP7 0.013636
1 ๊ณต๋ฒ” EP6 0.015881
2 ๊ณต๋ฒ” EP4 0.017971
3 ๊ณต๋ฒ” EP5 0.021128
4 ๊ณต๋ฒ” EP3 0.022360
5 ๊ณต๋ฒ” EP1 0.027118
6 ๊ณต๋ฒ” EP2 0.029844
7 ๊ณต๋ฒ” EP8 0.141357

Question 19

2021-11-01 00:00:00 ~ 15:00:00๊นŒ์ง€ ๊ฐ ์—ํ”ผ์†Œ๋“œ๋ณ„ viewcnt์˜ ์ฆ๊ฐ€๋Ÿ‰์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“œ์‹œ์˜ค

Hide code cell source
start = pd.to_datetime("2021-11-01 00:00:00")
end = pd.to_datetime("2021-11-01 15:00:00")

target = video.loc[(video["ct"] >= start) & (video['ct'] <= end)].reset_index(drop=True)

def check(x):
    result = max(x) - min(x)
    return result

answer = target[['videoname','viewcnt']].groupby("videoname").agg(check)
answer
viewcnt
videoname
๊ณต๋ฒ” EP1 13298
๊ณต๋ฒ” EP2 10300
๊ณต๋ฒ” EP3 9927
๊ณต๋ฒ” EP4 9824
๊ณต๋ฒ” EP5 10824
๊ณต๋ฒ” EP6 14141
๊ณต๋ฒ” EP7 26949
๊ณต๋ฒ” EP8 89147

Question 20

video ๋ฐ์ดํ„ฐ ์ค‘์—์„œ ์ค‘๋ณต๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์ค‘๋ณต๋˜๋Š” ๊ฐ ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐ„๋Œ€์™€ videoname ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer  = video[video.index.isin(set(video.index) -  set(video.drop_duplicates().index))]
result = answer[['videoname','ct']]
display(result)
videoname ct
722 ๊ณต๋ฒ” EP1 2021-10-13 09:41:37
3927 ๊ณต๋ฒ” EP2 2021-10-13 09:41:37

์›”๋“œ์ปต ์ถœ์ „์„ ์ˆ˜ ๊ณจ๊ธฐ๋ก ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/darinhawley/fifa-world-cup-goalscorers-19302018(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : 1930 ~2018๋…„๋„ ์›”๋“œ์ปต ์ถœ์ „์„ ์ˆ˜ ๊ณจ๊ธฐ๋ก
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/worldcup/worldcupgoals.csv

import pandas as pd

df= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/worldcup/worldcupgoals.csv')
df.head()
Player Goals Years Country
0 Miroslav Klose 16 2002-2006-2010-2014 Germany
1 Ronaldo 15 1998-2002-2006 Brazil
2 Gerd Muller 14 1970-1974 Germany
3 Just Fontaine 13 1958 France
4 Pele 12 1958-1962-1966-1970 Brazil

Question 21

์ฃผ์–ด์ง„ ์ „์ฒด ๊ธฐ๊ฐ„์˜ ๊ฐ ๋‚˜๋ผ๋ณ„ ๊ณจ๋“์ ์ˆ˜ ์ƒ์œ„ 5๊ฐœ ๊ตญ๊ฐ€์™€ ๊ทธ ๋“์ ์ˆ˜๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ํ˜•ํƒœ๋กœ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
result = df.groupby('Country').sum().sort_values('Goals',ascending=False).head(5)
display(result)
Goals
Country
Brazil 228
Germany 226
Argentina 135
Italy 123
France 115

Question 22

์ฃผ์–ด์ง„ ์ „์ฒด๊ธฐ๊ฐ„๋™์•ˆ ๊ณจ๋“์ ์„ ํ•œ ์„ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ๋‚˜๋ผ ์ƒ์œ„ 5๊ฐœ ๊ตญ๊ฐ€์™€ ๊ทธ ์„ ์ˆ˜ ์ˆซ์ž๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ํ˜•์‹์œผ๋กœ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
result = df.groupby('Country').size().sort_values(ascending=False).head(5)
print(result)
Country
Brazil       81
Germany      78
Italy        60
Argentina    59
France       58
dtype: int64

Question 23

Years ์ปฌ๋Ÿผ์€ ๋…„๋„ -๋…„๋„ ํ˜•์‹์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๊ณ , ๊ฐ ๋…„๋„๋Š” 4์ž๋ฆฌ ์ˆซ์ž์ด๋‹ค. ๋…„๋„ ํ‘œ๊ธฐ๊ฐ€ 4์ž๋ฆฌ ์ˆซ์ž๋กœ ์•ˆ๋œ ์ผ€์ด์Šค๊ฐ€ ์กด์žฌํ•œ๋‹ค. ํ•ด๋‹น ๊ฑด์€ ๋ช‡๊ฑด์ธ์ง€ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
df['yearLst'] = df.Years.str.split('-')

def checkFour(x):
    for value in x:
        if len(str(value)) != 4:
            return False
        
    return True
    
df['check'] = df['yearLst'].apply(checkFour)

result = len(df[df.check ==False])
result
45

Question 24

**Q3์—์„œ ๋ฐœ์ƒํ•œ ์˜ˆ์™ธ ์ผ€์ด์Šค๋ฅผ ์ œ์™ธํ•œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ df2๋ผ๊ณ  ์ •์˜ํ•˜๊ณ  ๋ฐ์ดํ„ฐ์˜ ํ–‰์˜ ์ˆซ์ž๋ฅผ ์ถœ๋ ฅํ•˜๋ผ (์•„๋ž˜ ๋ฌธ์ œ๋ถ€ํ„ฐ๋Š” df2๋กœ ํ’€์ดํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค) **

Hide code cell source
df2 = df[df.check ==True].reset_index(drop=True)
print(df2.shape[0])
1250

Question 25

์›”๋“œ์ปต ์ถœ์ „ํšŸ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” โ€˜LenCupโ€™ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜๊ณ  4ํšŒ ์ถœ์ „ํ•œ ์„ ์ˆ˜์˜ ์ˆซ์ž๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df2['LenCup'] =df2['yearLst'].str.len()
result = df2['LenCup'].value_counts()[4]
print(result)
16

Question 26

Yugoslavia ๊ตญ๊ฐ€์˜ ์›”๋“œ์ปต ์ถœ์ „ํšŸ์ˆ˜๊ฐ€ 2ํšŒ์ธ ์„ ์ˆ˜๋“ค์˜ ์ˆซ์ž๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = len(df2[(df2.LenCup==2) & (df2.Country =='Yugoslavia')])
print(result)
7

Question 27

2002๋…„๋„์— ์ถœ์ „ํ•œ ์ „์ฒด ์„ ์ˆ˜๋Š” ๋ช‡๋ช…์ธ๊ฐ€?

Hide code cell source
result =len(df2[df2.Years.str.contains('2002')])
print(result)
156

Question 28

์ด๋ฆ„์— โ€˜carlosโ€™ ๋‹จ์–ด๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” ์„ ์ˆ˜์˜ ์ˆซ์ž๋Š” ๋ช‡ ๋ช…์ธ๊ฐ€? (๋Œ€, ์†Œ๋ฌธ์ž ๊ตฌ๋ถ„ x)

Hide code cell source
result = len(df2[df2.Player.str.lower().str.contains('carlos')])
print(result)
13

Question 29

์›”๋“œ์ปต ์ถœ์ „ ํšŸ์ˆ˜๊ฐ€ 1ํšŒ๋ฟ์ธ ์„ ์ˆ˜๋“ค ์ค‘์—์„œ ๊ฐ€์žฅ ๋งŽ์€ ๋“์ ์„ ์˜ฌ๋ ธ๋˜ ์„ ์ˆ˜๋Š” ๋ˆ„๊ตฌ์ธ๊ฐ€?

Hide code cell source
result = df2[df2.LenCup==1].sort_values('Goals',ascending=False).Player.values[0]
print(result)
Just Fontaine

Question 30

์›”๋“œ์ปต ์ถœ์ „ํšŸ์ˆ˜๊ฐ€ 1ํšŒ ๋ฟ์ธ ์„ ์ˆ˜๋“ค์ด ๊ฐ€์žฅ ๋งŽ์€ ๊ตญ๊ฐ€๋Š” ์–ด๋””์ธ๊ฐ€?

Hide code cell source
result= df2[df2.LenCup==1].Country.value_counts().index[0]
print(result)
Brazil

์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ์ด์šฉ์ •๋ณด ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.data.go.kr/data/15051872/fileData.do(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์„œ์šธํŠน๋ณ„์‹œ_๊ณต๊ณต์ž์ „๊ฑฐ ์‹œ๊ฐ„๋Œ€๋ณ„ ์ด์šฉ์ •๋ณด
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/bicycle/seoul_bi.csv

import pandas as pd
df =pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/bicycle/seoul_bi.csv')
df.head()
๋Œ€์—ฌ์ผ์ž ๋Œ€์—ฌ์‹œ๊ฐ„ ๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ ๋Œ€์—ฌ๊ตฌ๋ถ„์ฝ”๋“œ ์„ฑ๋ณ„ ์—ฐ๋ น๋Œ€์ฝ”๋“œ ์ด์šฉ๊ฑด์ˆ˜ ์šด๋™๋Ÿ‰ ํƒ„์†Œ๋Ÿ‰ ์ด๋™๊ฑฐ๋ฆฌ ์‚ฌ์šฉ์‹œ๊ฐ„
0 2021-06-01 0 3541 ์ •๊ธฐ๊ถŒ F ~10๋Œ€ 1 0.00 0.00 0.00 8
1 2021-06-01 0 765 ์ •๊ธฐ๊ถŒ F ~10๋Œ€ 1 27.21 0.35 1526.81 19
2 2021-06-01 0 2637 ์ •๊ธฐ๊ถŒ F ~10๋Œ€ 1 41.40 0.37 1608.56 18
3 2021-06-01 0 2919 ์ •๊ธฐ๊ถŒ F ~10๋Œ€ 1 0.00 0.00 0.00 75
4 2021-06-01 0 549 ์ •๊ธฐ๊ถŒ F ~10๋Œ€ 1 13.04 0.17 731.55 6

Question 31

๋Œ€์—ฌ์ผ์ž๋ณ„ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ์ถœ๋ ฅํ•˜๊ณ , ๊ฐ€์žฅ ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๋‚ ์งœ๋ฅผ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
result = df['๋Œ€์—ฌ์ผ์ž'].value_counts().sort_index().to_frame()
answer = result[result.๋Œ€์—ฌ์ผ์ž  == result.๋Œ€์—ฌ์ผ์ž.max()].index[0]

display(result)
print(answer)
๋Œ€์—ฌ์ผ์ž
2021-06-01 48081
2021-06-02 56100
2021-06-03 18985
2021-06-04 56788
2021-06-05 52730
2021-06-06 47787
2021-06-07 52623
2021-06-08 55623
2021-06-09 54507
2021-06-10 36992
2021-06-11 52089
2021-06-12 48569
2021-06-13 43131
2021-06-14 54945
2021-06-04

Question 32

๊ฐ ์ผ์ž์˜ ์š”์ผ์„ ํ‘œ๊ธฐํ•˜๊ณ  (โ€˜Mondayโ€™ ~โ€™Sundayโ€™) โ€˜day_nameโ€™์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜๊ณ  ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ ์š”์ผ๋ณ„ ์ด์šฉ ํšŸ์ˆ˜์˜ ์ดํ•ฉ์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
df['๋Œ€์—ฌ์ผ์ž'] = pd.to_datetime(df['๋Œ€์—ฌ์ผ์ž'])
df['day_name']  = df['๋Œ€์—ฌ์ผ์ž'].dt.day_name()

result =  df.day_name.value_counts().to_frame()
print(result)
           day_name
Wednesday    110607
Friday       108877
Monday       107568
Tuesday      103704
Saturday     101299
Sunday        90918
Thursday      55977

Question 33

๊ฐ ์š”์ผ๋ณ„ ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉํ•œ ๋Œ€์—ฌ์†Œ์˜ ์ด์šฉํšŸ์ˆ˜์™€ ๋Œ€์—ฌ์†Œ ๋ฒˆํ˜ธ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
result = df.groupby(['day_name','๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ']).size().to_frame('size').sort_values(['day_name','size'],ascending=False).reset_index()
answer  = result.drop_duplicates('day_name',keep='first').reset_index(drop=True)
display(answer)
day_name ๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ size
0 Wednesday 502 282
1 Tuesday 502 267
2 Thursday 2715 137
3 Sunday 502 372
4 Saturday 502 378
5 Monday 502 242
6 Friday 502 277

Question 34

๋‚˜์ด๋Œ€๋ณ„ ๋Œ€์—ฌ๊ตฌ๋ถ„ ์ฝ”๋“œ์˜ (์ผ์ผ๊ถŒ/์ „์ฒดํšŸ์ˆ˜) ๋น„์œจ์„ ๊ตฌํ•œ ํ›„ ๊ฐ€์žฅ ๋†’์€ ๋น„์œจ์„ ๊ฐ€์ง€๋Š” ๋‚˜์ด๋Œ€๋ฅผ ํ™•์ธํ•˜๋ผ. ์ผ์ผ๊ถŒ์˜ ๊ฒฝ์šฐ ์ผ์ผ๊ถŒ ๊ณผ ์ผ์ผ๊ถŒ(๋น„ํšŒ์›)์„ ๋ชจ๋‘ ํฌํ•จํ•˜๋ผ

Hide code cell source
daily = df[df.๋Œ€์—ฌ๊ตฌ๋ถ„์ฝ”๋“œ.isin(['์ผ์ผ๊ถŒ','์ผ์ผ๊ถŒ(๋น„ํšŒ์›)'])].์—ฐ๋ น๋Œ€์ฝ”๋“œ.value_counts().sort_index()
total = df.์—ฐ๋ น๋Œ€์ฝ”๋“œ.value_counts().sort_index()

ratio = (daily /total).sort_values(ascending=False)
print(ratio)
print('max ratio age ',ratio.index[0])
~10๋Œ€    0.312525
20๋Œ€     0.239913
30๋Œ€     0.166702
40๋Œ€     0.110585
50๋Œ€     0.065414
70๋Œ€~    0.048962
60๋Œ€     0.042462
Name: ์—ฐ๋ น๋Œ€์ฝ”๋“œ, dtype: float64
max ratio age  ~10๋Œ€

Question 35

์—ฐ๋ น๋Œ€๋ณ„ ํ‰๊ท  ์ด๋™๊ฑฐ๋ฆฌ๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df[['์—ฐ๋ น๋Œ€์ฝ”๋“œ','์ด๋™๊ฑฐ๋ฆฌ']].groupby(['์—ฐ๋ น๋Œ€์ฝ”๋“œ']).mean()
print(result)
              ์ด๋™๊ฑฐ๋ฆฌ
์—ฐ๋ น๋Œ€์ฝ”๋“œ             
20๋Œ€    3211.890552
30๋Œ€    3341.443859
40๋Œ€    3514.857416
50๋Œ€    3593.668100
60๋Œ€    3538.145737
70๋Œ€~   3085.039641
~10๋Œ€   2634.426279

Question 36

์—ฐ๋ น๋Œ€ ์ฝ”๋“œ๊ฐ€ 20๋Œ€์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ ,์ด๋™๊ฑฐ๋ฆฌ๊ฐ’์ด ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ์˜ ์ด๋™๊ฑฐ๋ฆฌ๊ฐ’์˜ ํ‰๊ท  ์ด์ƒ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.์ตœ์ข… ์ถ”์ถœ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€์—ฌ์ผ์ž, ๋Œ€์—ฌ์†Œ ๋ฒˆํ˜ธ ์ˆœ์„œ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ ํ›„ 1ํ–‰๋ถ€ํ„ฐ 200ํ–‰๊นŒ์ง€์˜ ํƒ„์†Œ๋Ÿ‰์˜ ํ‰๊ท ์„ ์†Œ์ˆซ์  3์งธ ์ž๋ฆฌ๊นŒ์ง€ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
tw = df[df.์—ฐ๋ น๋Œ€์ฝ”๋“œ =='20๋Œ€'].reset_index(drop=True)
tw_mean = tw[tw.์ด๋™๊ฑฐ๋ฆฌ >= tw.์ด๋™๊ฑฐ๋ฆฌ.mean()].reset_index(drop=True)
tw_mean['ํƒ„์†Œ๋Ÿ‰'] =tw_mean['ํƒ„์†Œ๋Ÿ‰'].astype('float')
target =tw_mean.sort_values(['๋Œ€์—ฌ์ผ์ž','๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ'], ascending=False).reset_index(drop=True).iloc[:200].ํƒ„์†Œ๋Ÿ‰
result = round(target.sum()/len(target),3)
print(result)
1.613

Question 37

6์›” 7์ผ ~10๋Œ€์˜ โ€œ์ด์šฉ๊ฑด์ˆ˜โ€์˜ ์ค‘์•™๊ฐ’์€?

Hide code cell source
df['๋Œ€์—ฌ์ผ์ž']  =pd.to_datetime(df['๋Œ€์—ฌ์ผ์ž'])
result = df[(df.์—ฐ๋ น๋Œ€์ฝ”๋“œ =='~10๋Œ€') & (df.๋Œ€์—ฌ์ผ์ž ==pd.to_datetime('2021-06-07'))].์ด์šฉ๊ฑด์ˆ˜.median()
print(result)
1.0

Question 38

ํ‰์ผ (์›”~๊ธˆ) ์ถœ๊ทผ ์‹œ๊ฐ„๋Œ€(์˜ค์ „ 6,7,8์‹œ)์˜ ๋Œ€์—ฌ์†Œ๋ณ„ ์ด์šฉ ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•ด์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•œ ํ›„ ๊ฐ ๋Œ€์—ฌ์‹œ๊ฐ„๋ณ„ ์ด์šฉ ํšŸ์ˆ˜์˜ ์ƒ์œ„ 3๊ฐœ ๋Œ€์—ฌ์†Œ์™€ ์ด์šฉํšŸ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
target = df[(df.day_name.isin(['Tuesday', 'Wednesday', 'Thursday', 'Friday','Monday'])) & (df.๋Œ€์—ฌ์‹œ๊ฐ„.isin([6,7,8]))]
result = target.groupby(['๋Œ€์—ฌ์‹œ๊ฐ„','๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ']).size().to_frame('์ด์šฉ ํšŸ์ˆ˜')

answer = result.sort_values(['๋Œ€์—ฌ์‹œ๊ฐ„','์ด์šฉ ํšŸ์ˆ˜'],ascending=False).groupby('๋Œ€์—ฌ์‹œ๊ฐ„').head(3)
display(answer)
์ด์šฉ ํšŸ์ˆ˜
๋Œ€์—ฌ์‹œ๊ฐ„ ๋Œ€์—ฌ์†Œ๋ฒˆํ˜ธ
8 2701 119
646 115
1152 92
7 259 104
230 77
726 77
6 2744 45
1125 40
1028 36

Question 39

์ด๋™๊ฑฐ๋ฆฌ์˜ ํ‰๊ท  ์ด์ƒ์˜ ์ด๋™๊ฑฐ๋ฆฌ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ถ”์ถœ๋ฐ์ดํ„ฐ์˜ ์ด๋™๊ฑฐ๋ฆฌ์˜ ํ‘œ๋ณธํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result  = df[df.์ด๋™๊ฑฐ๋ฆฌ >= df.์ด๋™๊ฑฐ๋ฆฌ.mean()].reset_index(drop=True).์ด๋™๊ฑฐ๋ฆฌ.std()
print(result)
5092.139707505305

Question 40

๋‚จ์„ฑ(โ€˜Mโ€™ or โ€˜mโ€™)๊ณผ ์—ฌ์„ฑ(โ€˜Fโ€™ or โ€˜fโ€™)์˜ ์ด๋™๊ฑฐ๋ฆฌ๊ฐ’์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['sex'] = df['์„ฑ๋ณ„'].map(lambda x: '๋‚จ' if x in ['M','m'] else '์—ฌ')
answer = df[['sex','์ด๋™๊ฑฐ๋ฆฌ']].groupby('sex').mean()
display(answer)
์ด๋™๊ฑฐ๋ฆฌ
sex
๋‚จ 3209.110871
์—ฌ 3468.575025

์ „์„ธ๊ณ„ ํ–‰๋ณต๋„ ์ง€ํ‘œ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/unsdsn/world-happiness(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์ „์„ธ๊ณ„ ํ–‰๋ณต๋„ ์ง€ํ‘œ ์กฐ์‚ฌ
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/happy2/happiness.csv

import pandas as pd
df =pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/happy2/happiness.csv',encoding='utf-8')
df.head()
ํ–‰๋ณต๋žญํ‚น ๋‚˜๋ผ๋ช… ์ ์ˆ˜ ์ƒ๋Œ€GDP ์‚ฌํšŒ์ ์ง€์› ํ–‰๋ณต๊ธฐ๋Œ€์น˜ ์„ ํƒ์˜ ์ž์œ ๋„ ๊ด€๋Œ€ํ•จ ๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹ ๋…„๋„
0 1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393 2019
1 2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410 2019
2 3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341 2019
3 4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118 2019
4 5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298 2019

Question 41

๋ฐ์ดํ„ฐ๋Š” 2018๋…„๋„์™€ 2019๋…„๋„์˜ ์ „์„ธ๊ณ„ ํ–‰๋ณต ์ง€์ˆ˜๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค. ๊ฐ๋…„๋„์˜ ํ–‰๋ณต๋žญํ‚น 10์œ„๋ฅผ ์ฐจ์ง€ํ•œ ๋‚˜๋ผ์˜ ํ–‰๋ณต์ ์ˆ˜์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df[df.ํ–‰๋ณต๋žญํ‚น ==10]['์ ์ˆ˜'].mean()
print(result)
7.259

Question 42

๋ฐ์ดํ„ฐ๋Š” 2018๋…„๋„์™€ 2019๋…„๋„์˜ ์ „์„ธ๊ณ„ ํ–‰๋ณต ์ง€์ˆ˜๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค. ๊ฐ๋…„๋„์˜ ํ–‰๋ณต๋žญํ‚น 50์œ„์ด๋‚ด์˜ ๋‚˜๋ผ๋“ค์˜ ๊ฐ๊ฐ์˜ ํ–‰๋ณต์ ์ˆ˜ ํ‰๊ท ์„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œ์‹œํ•˜๋ผ

Hide code cell source
result = df[df.ํ–‰๋ณต๋žญํ‚น<=50][['๋…„๋„','์ ์ˆ˜']].groupby('๋…„๋„').mean()
print(result)
           ์ ์ˆ˜
๋…„๋„           
2018  6.64678
2019  6.67002

Question 43

2018๋…„๋„ ๋ฐ์ดํ„ฐ๋“ค๋งŒ ์ถ”์ถœํ•˜์—ฌ ํ–‰๋ณต์ ์ˆ˜์™€ ๋ถ€ํŒจ์— ๋Œ€ํ•œ ์ธ์‹์— ๋Œ€ํ•œ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df[df.๋…„๋„ ==2018][['์ ์ˆ˜','๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹']].corr().iloc[0,1]
print(result)
0.40529152271510027

Question 44

2018๋…„๋„์™€ 2019๋…„๋„์˜ ํ–‰๋ณต๋žญํ‚น์ด ๋ณ€ํ™”ํ•˜์ง€ ์•Š์€ ๋‚˜๋ผ๋ช…์˜ ์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = len(df[['ํ–‰๋ณต๋žญํ‚น','๋‚˜๋ผ๋ช…']]) - len(df[['ํ–‰๋ณต๋žญํ‚น','๋‚˜๋ผ๋ช…']].drop_duplicates())
print(result)
15

Question 45

2019๋…„๋„ ๋ฐ์ดํ„ฐ๋“ค๋งŒ ์ถ”์ถœํ•˜์—ฌ ๊ฐ๋ณ€์ˆ˜๊ฐ„ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•˜๊ณ  ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ ํ›„ ์ƒ์œ„ 5๊ฐœ๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ถœ๋ ฅํ•˜๋ผ. ์ปฌ๋Ÿผ๋ช…์€ v1,v2,corr์œผ๋กœ ํ‘œ์‹œํ•˜๋ผ

Hide code cell source
zz = df[df.๋…„๋„ ==2019].corr().unstack().to_frame().reset_index().dropna()

result = zz[zz[0] !=1].sort_values(0,ascending=False).drop_duplicates(0)
answer = result.head(5).reset_index(drop=True)
answer.columns = ['v1','v2','corr']
display(answer)
v1 v2 corr
0 ํ–‰๋ณต๊ธฐ๋Œ€์น˜ ์ƒ๋Œ€GDP 0.835462
1 ์ƒ๋Œ€GDP ์ ์ˆ˜ 0.793883
2 ํ–‰๋ณต๊ธฐ๋Œ€์น˜ ์ ์ˆ˜ 0.779883
3 ์‚ฌํšŒ์ ์ง€์› ์ ์ˆ˜ 0.777058
4 ์‚ฌํšŒ์ ์ง€์› ์ƒ๋Œ€GDP 0.754906

Question 46

๊ฐ ๋…„๋„๋ณ„ ํ•˜์œ„ ํ–‰๋ณต์ ์ˆ˜์˜ ํ•˜์œ„ 5๊ฐœ ๊ตญ๊ฐ€์˜ ํ‰๊ท  ํ–‰๋ณต์ ์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df.groupby('๋…„๋„').tail(5).groupby('๋…„๋„').mean()[['์ ์ˆ˜']]
print(result)
          ์ ์ˆ˜
๋…„๋„          
2018  3.1800
2019  3.1408

Question 47

2019๋…„ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํ•ด๋‹น๋ฐ์ดํ„ฐ์˜ ์ƒ๋Œ€ GDP ํ‰๊ท  ์ด์ƒ์˜ ๋‚˜๋ผ๋“ค๊ณผ ํ‰๊ท  ์ดํ•˜์˜ ๋‚˜๋ผ๋“ค์˜ ํ–‰๋ณต์ ์ˆ˜ ํ‰๊ท ์„ ๊ฐ๊ฐ ๊ตฌํ•˜๊ณ  ๊ทธ ์ฐจ์ด๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
over = df[df.์ƒ๋Œ€GDP >= df.์ƒ๋Œ€GDP.mean()]['์ ์ˆ˜'].mean()
under = df[df.์ƒ๋Œ€GDP <= df.์ƒ๋Œ€GDP.mean()]['์ ์ˆ˜'].mean()

result= over - under
print(result)
1.4212148925537225

Question 48

๊ฐ๋…„๋„์˜ ๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹์„ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌํ–ˆ์„๋•Œ ์ƒ์œ„ 20๊ฐœ ๊ตญ๊ฐ€์˜ ๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df.sort_values(['๋…„๋„','๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹'],ascending=False).groupby('๋…„๋„').head(20).groupby(['๋…„๋„']).mean()[['๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹']]
print(result)
      ๋ถ€ํŒจ์— ๋Œ€ํ•œ์ธ์‹
๋…„๋„            
2018    0.3267
2019    0.3201

Question 49

2018๋…„๋„ ํ–‰๋ณต๋žญํ‚น 50์œ„ ์ด๋‚ด์— ํฌํ•จ๋๋‹ค๊ฐ€ 2019๋…„ 50์œ„ ๋ฐ–์œผ๋กœ ๋ฐ€๋ ค๋‚œ ๊ตญ๊ฐ€์˜ ์ˆซ์ž๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = set(df[(df.๋…„๋„==2018) & (df.ํ–‰๋ณต๋žญํ‚น <=50)].๋‚˜๋ผ๋ช…)  -set(df[(df.๋…„๋„==2019) & (df.ํ–‰๋ณต๋žญํ‚น <=50)].๋‚˜๋ผ๋ช…)
answer = len(result)
print(answer)
4

Question 50

2018๋…„,2019๋…„ ๋ชจ๋‘ ๊ธฐ๋ก์ด ์žˆ๋Š” ๋‚˜๋ผ๋“ค ์ค‘ ๋…„๋„๋ณ„ ํ–‰๋ณต์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ฆ๊ฐ€ํ•œ ๋‚˜๋ผ์™€ ๊ทธ ์ฆ๊ฐ€ ์ˆ˜์น˜๋Š”?

Hide code cell source
count = df.๋‚˜๋ผ๋ช….value_counts()
target = count[count>=2].index

df2 =df.copy()
multiple = df2[df2.๋‚˜๋ผ๋ช….isin(target)].reset_index(drop=True)
multiple.loc[multiple['๋…„๋„']==2018,'์ ์ˆ˜'] = multiple[multiple.๋…„๋„ ==2018]['์ ์ˆ˜'].values * (-1)
result = multiple.groupby('๋‚˜๋ผ๋ช…').sum()['์ ์ˆ˜'].sort_values().to_frame().iloc[-1]
result
์ ์ˆ˜    0.87
Name: Burundi, dtype: float64

์ง€์—ญ๊ตฌ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ : https://archive.ics.uci.edu/ml/datasets/Power+consumption+of+Tetouan+city (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๊ธฐ์˜จ, ์Šต๋„,๋ฐ”๋žŒํ’์†์— ๋”ฐ๋ฅธ ๋„์‹œ์˜ 3๊ฐœ ์ง€์—ญ๊ตฌ์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/consum/Tetuan City power consumption.csv

import pandas as pd
df= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/consum/Tetuan%20City%20power%20consumption.csv')
df.head()
DateTime Temperature Humidity Wind Speed general diffuse flows diffuse flows Zone 1 Power Consumption Zone 2 Power Consumption Zone 3 Power Consumption
0 1/1/2017 0:00 6.559 73.8 0.083 0.051 0.119 34055.69620 16128.87538 20240.96386
1 1/1/2017 0:10 6.414 74.5 0.083 0.070 0.085 29814.68354 19375.07599 20131.08434
2 1/1/2017 0:20 6.313 74.5 0.080 0.062 0.100 29128.10127 19006.68693 19668.43373
3 1/1/2017 0:30 6.121 75.0 0.083 0.091 0.096 28228.86076 18361.09422 18899.27711
4 1/1/2017 0:40 5.921 75.7 0.081 0.048 0.085 27335.69620 17872.34043 18442.40964

Question 51

DateTime์ปฌ๋Ÿผ์„ ํ†ตํ•ด ๊ฐ ์›”๋ณ„๋กœ ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['DateTime'] = pd.to_datetime(df['DateTime'])
result = df['DateTime'].dt.month.value_counts().sort_index().to_frame()
print(result)
    DateTime
1       4464
2       4032
3       4464
4       4320
5       4464
6       4320
7       4464
8       4464
9       4320
10      4464
11      4320
12      4320

Question 52

3์›”๋‹ฌ์˜ ๊ฐ ์‹œ๊ฐ„๋Œ€๋ณ„ ์˜จ๋„์˜ ํ‰๊ท ๋“ค ์ค‘ ๊ฐ€์žฅ ๋‚ฎ์€ ์‹œ๊ฐ„๋Œ€์˜ ์˜จ๋„๋ฅผ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
target = df[df.DateTime.dt.month ==3]
result = target.groupby(target.DateTime.dt.hour)['Temperature'].mean().sort_values().values[0]
print(result)
11.506612903225806

Question 53

3์›”๋‹ฌ์˜ ๊ฐ ์‹œ๊ฐ„๋Œ€๋ณ„ ์˜จ๋„์˜ ํ‰๊ท ๋“ค ์ค‘ ๊ฐ€์žฅ ๋†’์€ ์‹œ๊ฐ„๋Œ€์˜ ์˜จ๋„๋ฅผ ์ถœ๋ ฅํ•˜๋ผ

Hide code cell source
target = df[df.DateTime.dt.month ==3]
result = target.groupby(target.DateTime.dt.hour)['Temperature'].mean().sort_values().values[-1]
print(result)
18.393602150537635

Question 54

Zone 1 Power Consumption ์ปฌ๋Ÿผ์˜ value๊ฐ’์˜ ํฌ๊ธฐ๊ฐ€ Zone 2 Power Consumption ์ปฌ๋Ÿผ์˜ value๊ฐ’์˜ ํฌ๊ธฐ๋ณด๋‹ค ํฐ ๋ฐ์ดํ„ฐ๋“ค์˜ Humidity์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df[df['Zone 1 Power Consumption'] > df['Zone 2  Power Consumption']].Humidity.mean()
print(result)
68.23624448055094

Question 55

๊ฐ zone์˜ ์—๋„ˆ์ง€ ์†Œ๋น„๋Ÿ‰์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ตฌํ•ด์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œ๊ธฐํ•˜๋ผ

Hide code cell source
result = df.iloc[:,-3:].corr()
display(result)
Zone 1 Power Consumption Zone 2 Power Consumption Zone 3 Power Consumption
Zone 1 Power Consumption 1.000000 0.834519 0.750733
Zone 2 Power Consumption 0.834519 1.000000 0.570932
Zone 3 Power Consumption 0.750733 0.570932 1.000000

Question 56

Temperature์˜ ๊ฐ’์ด 10๋ฏธ๋งŒ์˜ ๊ฒฝ์šฐ A, 10์ด์ƒ 20๋ฏธ๋งŒ์˜ ๊ฒฝ์šฐ B,20์ด์ƒ 30๋ฏธ๋งŒ์˜ ๊ฒฝ์šฐ C, ๊ทธ ์™ธ์˜ ๊ฒฝ์šฐ D๋ผ๊ณ  ํ• ๋•Œ ๊ฐ ๋‹จ๊ณ„์˜ ๋ฐ์ดํ„ฐ ์ˆซ์ž๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
def split_data(x):
    if x<10:
        return "A"
    elif x<20:
        return 'B'
    elif x<30:
        return 'C'
    else:
        return 'D'
    
df['sp'] = df.Temperature.map(split_data)
result = df['sp'].value_counts()
display(result)
B    26993
C    21105
A     2874
D     1444
Name: sp, dtype: int64

Question 57

6์›” ๋ฐ์ดํ„ฐ์ค‘ 12์‹œ์˜ Temperature์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result =df[(df.DateTime.dt.month ==6) & (df.DateTime.dt.hour ==12)].Temperature.std()
print(result)
2.049941782795103

Question 58

6์›” ๋ฐ์ดํ„ฐ์ค‘ 12์‹œ์˜ Temperature์˜ ๋ถ„์‚ฐ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result =df[(df.DateTime.dt.month ==6) & (df.DateTime.dt.hour ==12)].Temperature.var()
print(result)
4.202261312849164

Question 59

Temperature์˜ ํ‰๊ท ์ด์ƒ์˜ Temperature์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ Temperature๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ ํ–ˆ์„๋•Œ 4๋ฒˆ์งธ ํ–‰์˜ Humidity ๊ฐ’์€?

Hide code cell source
result = df[df.Temperature >= df.Temperature.mean()].sort_values('Temperature').Humidity.values[3]
print(result)
87.9

Question 60

**Temperature์˜ ์ค‘๊ฐ„๊ฐ’ ์ด์ƒ์˜ Temperature์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผTemperature๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ ํ–ˆ์„๋•Œ 4๋ฒˆ์งธ ํ–‰์˜ Humidity ๊ฐ’์€?

**

Hide code cell source
result = df[df.Temperature >= df.Temperature.median()].sort_values('Temperature').Humidity.values[3]
print(result)
80.3

ํฌ์ผ“๋ชฌ ์ •๋ณด ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ : https://www.kaggle.com/abcsds/pokemon (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ํฌ์ผ“๋ชฌ ์ •๋ณด
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/pok/Pokemon.csv

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/pok/Pokemon.csv')
df.head()
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

Question 61

Legendary ์ปฌ๋Ÿผ์€ ์ „์„คํฌ์ผ“๋ชฌ ์œ ๋ฌด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.์ „์„คํฌ์ผ“๋ชฌ๊ณผ ๊ทธ๋ ‡์ง€ ์•Š์€ ํฌ์ผ“๋ชฌ๋“ค์˜ HPํ‰๊ท ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
target = df.groupby('Legendary').mean()['HP']
result = target.values[1] -target.values[0]
print(result)
25.55614861329147

Question 62

Type 1์€ ์ฃผ์†์„ฑ Type 2 ๋Š” ๋ถ€์†์„ฑ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๊ฐ€์žฅ ๋งŽ์€ ๋ถ€์†์„ฑ ์ข…๋ฅ˜๋Š” ๋ฌด์—‡์ธ๊ฐ€?

Hide code cell source
result = df['Type 2'].value_counts().index[0]
print(result)
Flying

Question 63

๊ฐ€์žฅ ๋งŽ์€ Type 1 ์˜ ์ข…์˜ ํ‰๊ท  Attack ์„ ํ‰๊ท  Defense๋กœ ๋‚˜๋ˆˆ๊ฐ’์€?

Hide code cell source
Max = df['Type 1'].value_counts().index[0]

result = df[df['Type 1']== Max].Attack.mean() /df[df['Type 1']== Max].Defense.mean()
print(result)
1.0165238678090576

Question 64

ํฌ์ผ“๋ชฌ ์„ธ๋Œ€(Generation) ์ค‘ ๊ฐ€์žฅ๋งŽ์€ Legendary๋ฅผ ๋ณด์œ ํ•œ ์„ธ๋Œ€๋Š” ๋ช‡์„ธ๋Œ€์ธ๊ฐ€?

Hide code cell source
result =df[df.Legendary==True].Generation.value_counts().index[0]
result
3

Question 65

โ€˜HPโ€™, โ€˜Attackโ€™, โ€˜Defenseโ€™, โ€˜Sp. Atkโ€™, โ€˜Sp. Defโ€™, โ€˜Speedโ€™ ๊ฐ„์˜ ์ƒ๊ด€ ๊ณ„์ˆ˜์ค‘ ๊ฐ€์žฅ ์ ˆ๋Œ“๊ฐ’์ด ํฐ ๋‘ ๋ณ€์ˆ˜์™€ ๊ทธ ๊ฐ’์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
target = df[[ 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']].corr().unstack().reset_index().rename(columns={0: "corr"})
result = target[target['corr']!=1].sort_values('corr',ascending=False).iloc[0]
print(result)
level_0     Sp. Def
level_1     Defense
corr       0.510747
Name: 26, dtype: object

Question 66

๊ฐ Generation์˜ Attack์œผ๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌ์‹œ ์ƒ์œ„ 3๊ฐœ ๋ฐ์ดํ„ฐ๋“ค(18๊ฐœ)์˜ Attack์˜ ์ „์ฒด ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result =  df.sort_values(['Generation','Attack']).groupby('Generation').head(3).Attack.mean()
print(result)
19.5

Question 67

๊ฐ Generation์˜ Attack์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์‹œ ์ƒ์œ„ 5๊ฐœ ๋ฐ์ดํ„ฐ๋“ค(30๊ฐœ)์˜ Attack์˜ ์ „์ฒด ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

result =  df.sort_values(['Generation','Attack'],ascending=False).groupby('Generation').head(5).Attack.mean()
print(result)
157.23333333333332

Question 68

๊ฐ€์žฅ ํ”ํ•˜๊ฒŒ ๋ฐœ๊ฒฌ๋˜๋Š” (Type1 , Type2) ์˜ ์Œ์€ ๋ฌด์—‡์ธ๊ฐ€?

Hide code cell source
result = df[['Type 1','Type 2']].value_counts().head(1)
print(result)
Type 1  Type 2
Normal  Flying    24
dtype: int64

Question 69

ํ•œ๋ฒˆ์”ฉ๋งŒ ์กด์žฌํ•˜๋Š” (Type1 , Type2)์˜ ์Œ์˜ ๊ฐฏ์ˆ˜๋Š” ๋ช‡๊ฐœ์ธ๊ฐ€?

Hide code cell source
target = df[['Type 1','Type 2']].value_counts()
result = len(target[target==1])
print(result)
39

Question 70

ํ•œ๋ฒˆ์”ฉ๋งŒ ์กด์žฌํ•˜๋Š” (Type1 , Type2)์˜ ์Œ์„ ๊ฐ ์„ธ๋Œ€(Generation)์€ ๊ฐ๊ฐ ๋ช‡๊ฐœ์”ฉ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๊ฐ€?

Hide code cell source
target = df[['Type 1','Type 2']].value_counts()
target2 =target[target==1]

lst = []
for value in target2.reset_index().values:
    t1 = value[0]
    t2 = value[1]
    
    sp = df[(df['Type 1']==t1) & (df['Type 2']==t2)]
    lst.append(sp)

result = pd.concat(lst).reset_index(drop=True).Generation.value_counts().sort_index()
print(result)
1     1
2     4
3     5
4    13
5     7
6     9
Name: Generation, dtype: int64

๋Œ€ํ•œ๋ฏผ๊ตญ ์ฒด๋ ฅ์žฅ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ : ๊ตญ๋ฏผ์ฒด์œก์ง„ํฅ๊ณต๋‹จ (๋ฌธํ™” ๋น…๋ฐ์ดํ„ฐํ”Œ๋žซํผ) (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋Œ€ํ•œ๋ฏผ๊ตญ ๊ตญ๋ฏผ ์ฒด๋ ฅ์žฅ ํ‰๊ฐ€
data url = https://raw.githubusercontent.com/Datamanim/datarepo/main/body/body.csv

import pandas as pd
df =pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/body/body.csv')
df.head()
์ธก์ •๋‚˜์ด ์ธก์ •ํšŒ์›์„ฑ๋ณ„ ์‹ ์žฅ : cm ์ฒด์ค‘ : kg ์ฒด์ง€๋ฐฉ์œจ : % ์ด์™„๊ธฐํ˜ˆ์••(์ตœ์ €) : mmHg ์ˆ˜์ถ•๊ธฐํ˜ˆ์••(์ตœ๊ณ ) : mmHg ์•…๋ ฅD : kg ์•‰์•„์œ—๋ชธ์•ž์œผ๋กœ๊ตฝํžˆ๊ธฐ : cm ๊ต์ฐจ์œ—๋ชธ์ผ์œผํ‚ค๊ธฐ : ํšŒ ์ œ์ž๋ฆฌ ๋ฉ€๋ฆฌ๋›ฐ๊ธฐ : cm ๋“ฑ๊ธ‰
0 59.0 M 175.0 70.6 19.2 91.0 150.0 40.6 12.2 30.0 179.0 C
1 40.0 F 161.4 45.8 21.7 62.0 119.0 22.1 16.0 32.0 165.0 B
2 27.0 M 176.6 84.8 19.2 87.0 153.0 45.9 13.2 61.0 216.0 B
3 38.0 M 167.6 76.7 24.7 63.0 132.0 43.6 16.0 45.0 231.0 A
4 21.0 M 165.2 66.2 21.5 83.0 106.0 33.5 10.6 46.0 198.0 C

Question 71

์ „์ฒด๋ฐ์ดํ„ฐ์˜ ์ˆ˜์ถ•๊ธฐํ˜ˆ์••(์ตœ๊ณ ) - ์ด์™„๊ธฐํ˜ˆ์••(์ตœ์ €)์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result  = (df['์ˆ˜์ถ•๊ธฐํ˜ˆ์••(์ตœ๊ณ ) : mmHg'] - df['์ด์™„๊ธฐํ˜ˆ์••(์ตœ์ €) : mmHg']).mean()
print(result)
52.19539414750672

Question 72

50~59์„ธ์˜ ์‹ ์žฅํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df[(df.์ธก์ •๋‚˜์ด <60) & (df.์ธก์ •๋‚˜์ด>=50)].iloc[:,2].mean()
print(result)
164.07490107405295

Question 73

์—ฐ๋ น๋Œ€ (20~29 : 20๋Œ€ โ€ฆ) ๋ณ„ ์ธ์›์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['์—ฐ๋ น๋Œ€']  =df.์ธก์ •๋‚˜์ด //10 *10
result = df['์—ฐ๋ น๋Œ€'].value_counts()
print(result)
20.0    5831
30.0    2660
40.0    1801
50.0    1769
60.0    1335
Name: ์—ฐ๋ น๋Œ€, dtype: int64

Question 74

์—ฐ๋ น๋Œ€ (20~29 : 20๋Œ€ โ€ฆ) ๋ณ„ ๋“ฑ๊ธ‰์˜ ์ˆซ์ž๋ฅผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œํ˜„ํ•˜๋ผ

Hide code cell source
result = df.groupby(['์—ฐ๋ น๋Œ€','๋“ฑ๊ธ‰'],as_index=False).size()
print(result)
     ์—ฐ๋ น๋Œ€ ๋“ฑ๊ธ‰  size
0   20.0  A  1585
1   20.0  B  1443
2   20.0  C  1455
3   20.0  D  1348
4   30.0  A   743
5   30.0  B   697
6   30.0  C   626
7   30.0  D   594
8   40.0  A   386
9   40.0  B   428
10  40.0  C   455
11  40.0  D   532
12  50.0  A   321
13  50.0  B   410
14  50.0  C   474
15  50.0  D   564
16  60.0  A   314
17  60.0  B   371
18  60.0  C   339
19  60.0  D   311

Question 75

๋‚จ์„ฑ ์ค‘ A๋“ฑ๊ธ‰๊ณผ D๋“ฑ๊ธ‰์˜ ์ฒด์ง€๋ฐฉ๋ฅ  ํ‰๊ท ์˜ ์ฐจ์ด(ํฐ ๊ฐ’์—์„œ ์ž‘์€ ๊ฐ’์˜ ์ฐจ)๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = abs(df[(df.์ธก์ •ํšŒ์›์„ฑ๋ณ„ =='M') &(df.๋“ฑ๊ธ‰ =='A')].iloc[:,4].mean() -df[(df.์ธก์ •ํšŒ์›์„ฑ๋ณ„ =='M') &(df.๋“ฑ๊ธ‰ =='D')].iloc[:,4].mean())
print(result)
7.932086486137457

Question 76

์—ฌ์„ฑ ์ค‘ A๋“ฑ๊ธ‰๊ณผ D๋“ฑ๊ธ‰์˜ ์ฒด์ค‘์˜ ํ‰๊ท ์˜ ์ฐจ์ด(ํฐ ๊ฐ’์—์„œ ์ž‘์€ ๊ฐ’์˜ ์ฐจ)๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = abs(df[(df.์ธก์ •ํšŒ์›์„ฑ๋ณ„ =='F') &(df.๋“ฑ๊ธ‰ =='A')].iloc[:,3].mean() -df[(df.์ธก์ •ํšŒ์›์„ฑ๋ณ„ =='F') &(df.๋“ฑ๊ธ‰ =='D')].iloc[:,3].mean())
print(result)
5.176211590296511

Question 77

bmi๋Š” ์ž์‹ ์˜ ๋ชธ๋ฌด๊ฒŒ(kg)๋ฅผ ํ‚ค์˜ ์ œ๊ณฑ(m)์œผ๋กœ ๋‚˜๋ˆˆ๊ฐ’์ด๋‹ค. ๋ฐ์ดํ„ฐ์˜ bmi ๋ฅผ ๊ตฌํ•œ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค๊ณ  ๋‚จ์„ฑ์˜ bmi ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['bmi'] = df['์ฒด์ค‘ : kg'] / (df['์‹ ์žฅ : cm']/100) **2
result = df[df.์ธก์ •ํšŒ์›์„ฑ๋ณ„ =='M'].bmi.mean()

print(result)
24.461344098193027

Question 78

bmi๋ณด๋‹ค ์ฒด์ง€๋ฐฉ์œจ์ด ๋†’์€ ์‚ฌ๋žŒ๋“ค์˜ ์ฒด์ค‘ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result =df[df.bmi <df['์ฒด์ง€๋ฐฉ์œจ : %']]['์ฒด์ค‘ : kg'].mean()
print(result)
61.740880639254314

Question 79

๋‚จ์„ฑ๊ณผ ์—ฌ์„ฑ์˜ ์•…๋ ฅ ํ‰๊ท ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
target= df.groupby('์ธก์ •ํšŒ์›์„ฑ๋ณ„')['์•…๋ ฅD : kg'].mean()

result = target.M - target.F
print(result)
17.55954185047465

Question 80

๋‚จ์„ฑ๊ณผ ์—ฌ์„ฑ์˜ ๊ต์ฐจ์œ—๋ชธ์ผ์œผํ‚ค๊ธฐ ํšŸ์ˆ˜์˜ ํ‰๊ท ์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
target= df.groupby('์ธก์ •ํšŒ์›์„ฑ๋ณ„')['๊ต์ฐจ์œ—๋ชธ์ผ์œผํ‚ค๊ธฐ : ํšŒ'].mean()

result = target.M - target.F
print(result)
14.243156833157627

๊ธฐ์˜จ ๊ฐ•์ˆ˜๋Ÿ‰ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ : https://data.kma.go.kr/cmmn/static/staticPage.do?page=intro
๋ฐ์ดํ„ฐ ์„ค๋ช… : 2020๋…„๋„ ์ดํ™”๋™(์„œ์šธ) , ์ˆ˜์˜๋™(๋ถ€์‚ฐ)์˜ ์‹œ๊ฐ„๋‹จ์œ„์˜ ๊ธฐ์˜จ๊ณผ ๊ฐ•์ˆ˜๋Ÿ‰
๋ฐ์ดํ„ฐ url : https://raw.githubusercontent.com/Datamanim/datarepo/main/weather/weather2.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/weather/weather2.csv")
df.head()
time ์ดํ™”๋™๊ฐ•์ˆ˜ ์ดํ™”๋™๊ธฐ์˜จ ์ˆ˜์˜๋™๊ฐ•์ˆ˜ ์ˆ˜์˜๋™๊ธฐ์˜จ
0 2020-01-01 00:00:00 0.1 -3.9 0.0 2.5
1 2020-01-01 01:00:00 0.0 -3.1 0.0 3.4
2 2020-01-01 02:00:00 0.0 -1.8 0.0 3.8
3 2020-01-01 03:00:00 0.0 -0.8 0.0 4.6
4 2020-01-01 04:00:00 0.0 -0.1 0.0 5.1

Question 81

์—ฌ๋ฆ„์ฒ (6์›”,7์›”,8์›”) ์ดํ™”๋™์ด ์ˆ˜์˜๋™๋ณด๋‹ค ๋†’์€ ๊ธฐ์˜จ์„ ๊ฐ€์ง„ ์‹œ๊ฐ„๋Œ€๋Š” ๋ช‡๊ฐœ์ธ๊ฐ€?

Hide code cell source
df.time = pd.to_datetime(df.time)
summer = df.loc[df.time.dt.month.isin([6,7,8])].reset_index(drop=True)
answer = len(summer.loc[summer['์ดํ™”๋™๊ธฐ์˜จ'] > summer['์ˆ˜์˜๋™๊ธฐ์˜จ']])
print(answer)
1415

Question 82

์ดํ™”๋™๊ณผ ์ˆ˜์˜๋™์˜ ์ตœ๋Œ€๊ฐ•์ˆ˜๋Ÿ‰์˜ ์‹œ๊ฐ„๋Œ€๋ฅผ ๊ฐ๊ฐ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer1 = df.loc[df['์ดํ™”๋™๊ฐ•์ˆ˜']==df['์ดํ™”๋™๊ฐ•์ˆ˜'].max()].time.values
answer2 = df.loc[df['์ˆ˜์˜๋™๊ฐ•์ˆ˜']==df['์ˆ˜์˜๋™๊ฐ•์ˆ˜'].max()].time.values
print(answer1,answer2)
['2020-09-30T09:00:00.000000000'] ['2020-07-23T12:00:00.000000000']

์„œ๋น„์Šค ์ดํƒˆ์˜ˆ์ธก ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ : https://www.kaggle.com/shubh0799/churn-modelling ์—์„œ ๋ณ€ํ˜•
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๊ณ ๊ฐ์˜ ์‹ ์ƒ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•œ ํšŒ์‚ฌ ์„œ๋น„์Šค ์ดํƒˆ ์˜ˆ์ธก (์ข…์†๋ณ€์ˆ˜ : Exited)
Data url : https://raw.githubusercontent.com/Datamanim/datarepo/main/churn/train.csv

import pandas as pd
#๋ฐ์ดํ„ฐ ๋กœ๋“œ
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/churn/train.csv")
df.head()
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 6842 15793491 Cherkasova 714 Germany Male 26 3 119545.48 2 1 0 65482.94 0
1 8963 15607874 Keane 687 France Male 38 0 144450.58 1 0 1 137276.83 0
2 7047 15737627 Rivero 589 Germany Female 20 2 121093.29 2 1 0 3529.72 0
3 7503 15697844 Whitehouse 721 Spain Female 32 10 0.00 1 1 0 136119.96 1
4 3439 15722404 Carpenter 445 France Female 30 3 0.00 2 1 1 127939.19 0

Question 83

๋‚จ์„ฑ ์ดํƒˆ(Exited)์ด ๊ฐ€์žฅ ๋งŽ์€ ๊ตญ๊ฐ€(Geography)๋Š” ์–ด๋””์ด๊ณ  ์ดํƒˆ ์ธ์›์€ ๋ช‡๋ช…์ธ๊ฐ€?

Hide code cell source
answer = df.loc[df.Gender=='Male'].groupby(['Geography']).sum().Exited.sort_values(ascending=False).head(1)
print(answer)
Geography
Germany    287
Name: Exited, dtype: int64

Question 84

**์นด๋“œ๋ฅผ ์†Œ์œ (HasCrCard ==1)ํ•˜๊ณ  ์žˆ์œผ๋ฉด์„œ ํ™œ์„ฑ๋ฉค๋ฒ„(IsActiveMember ==1) ์ธ ๊ณ ๊ฐ๋“ค์˜ ํ‰๊ท  ๋‚˜์ด๋ฅผ ์†Œ์ˆซ์ ์ดํ•˜ 4์ž๋ฆฌ๊นŒ์ง€ ๊ตฌํ•˜์—ฌ๋ผ? **

Hide code cell source
answer = df.loc[(df.HasCrCard==1) &(df.IsActiveMember==1)].Age.mean()
print(round(answer,4))
39.6102

Question 85

Balance ๊ฐ’์ด ์ค‘๊ฐ„๊ฐ’ ์ด์ƒ์„ ๊ฐ€์ง€๋Š” ๊ณ ๊ฐ๋“ค์˜ CreditScore์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ์†Œ์ˆซ์ ์ดํ•˜ 3์ž๋ฆฌ๊นŒ์ง€ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer =df.loc[df.Balance >= df.Balance.median()].CreditScore.std()
print(round(answer,3))
97.295

์„ฑ์ธ ๊ฑด๊ฐ•๊ฒ€์ง„ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.data.go.kr/data/15007122/fileData.do (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : 2018๋…„๋„ ์„ฑ์ธ์˜ ๊ฑด๊ฐ•๊ฒ€ ์ง„๋ฐ์ดํ„ฐ (ํก์—ฐ์ƒํƒœ 1- ํก์—ฐ, 0-๋น„ํก์—ฐ )
๋ฐ์ดํ„ฐ url : https://raw.githubusercontent.com/Datamanim/datarepo/main/smoke/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/smoke/train.csv")
df.head()
์„ฑ๋ณ„์ฝ”๋“œ ์—ฐ๋ น๋Œ€์ฝ”๋“œ(5์„ธ๋‹จ์œ„) ์‹ ์žฅ(5Cm๋‹จ์œ„) ์ฒด์ค‘(5Kg๋‹จ์œ„) ํ—ˆ๋ฆฌ๋‘˜๋ ˆ ์‹œ๋ ฅ(์ขŒ) ์‹œ๋ ฅ(์šฐ) ์ฒญ๋ ฅ(์ขŒ) ์ฒญ๋ ฅ(์šฐ) ์ˆ˜์ถ•๊ธฐํ˜ˆ์•• ... ํ˜ˆ์ƒ‰์†Œ ์š”๋‹จ๋ฐฑ ํ˜ˆ์ฒญํฌ๋ ˆ์•„ํ‹ฐ๋‹Œ (ํ˜ˆ์ฒญ์ง€์˜คํ‹ฐ)AST (ํ˜ˆ์ฒญ์ง€์˜คํ‹ฐ)ALT ๊ฐ๋งˆ์ง€ํ‹ฐํ”ผ ํก์—ฐ์ƒํƒœ ๊ตฌ๊ฐ•๊ฒ€์ง„์ˆ˜๊ฒ€์—ฌ๋ถ€ ์น˜์•„์šฐ์‹์ฆ์œ ๋ฌด ์น˜์„
0 F 55 145 55 73.0 0.7 0.5 1.0 1.0 129.0 ... 12.9 1.0 0.9 172.0 209.0 15.0 0 Y 0.0 Y
1 M 40 180 55 74.0 1.5 1.5 1.0 1.0 102.0 ... 15.6 1.0 0.8 30.0 19.0 23.0 1 Y 0.0 Y
2 F 55 150 50 72.0 1.0 0.2 1.0 1.0 116.0 ... 14.1 1.0 1.0 31.0 19.0 10.0 0 Y 0.0 N
3 M 40 170 85 89.0 1.0 0.8 1.0 1.0 124.0 ... 16.7 1.0 1.0 26.0 38.0 108.0 1 Y 1.0 Y
4 F 40 155 45 62.0 0.5 1.0 1.0 1.0 127.0 ... 12.9 1.0 0.7 20.0 11.0 13.0 0 Y 0.0 N

5 rows ร— 26 columns

Question 86

์ˆ˜์ถ•๊ธฐํ˜ˆ์••๊ณผ ์ด์™„๊ธฐ ํ˜ˆ์••๊ธฐ ์ˆ˜์น˜์˜ ์ฐจ์ด๋ฅผ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ(โ€˜ํ˜ˆ์••์ฐจโ€™) ์œผ๋กœ ์ƒ์„ฑํ•˜๊ณ , ์—ฐ๋ น๋Œ€ ์ฝ”๋“œ๋ณ„ ๊ฐ ๊ทธ๋ฃน ์ค‘ โ€˜ํ˜ˆ์••์ฐจโ€™ ์˜ ๋ถ„์‚ฐ์ด 5๋ฒˆ์งธ๋กœ ํฐ ์—ฐ๋ น๋Œ€ ์ฝ”๋“œ๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['ํ˜ˆ์••์ฐจ'] =df['์ˆ˜์ถ•๊ธฐํ˜ˆ์••'] -df['์ด์™„๊ธฐํ˜ˆ์••']
answer = df.groupby('์—ฐ๋ น๋Œ€์ฝ”๋“œ(5์„ธ๋‹จ์œ„)')['ํ˜ˆ์••์ฐจ'].var().sort_values()
print(answer.index[-5])
60

Question 87

๋น„๋งŒ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ์ธ WHtR๋Š” ํ—ˆ๋ฆฌ๋‘˜๋ ˆ / ํ‚ค๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ 0.58์ด์ƒ์ด๋ฉด ๋น„๋งŒ์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ์ค‘ WHtR ์ง€ํ‘œ์ƒ ๋น„๋งŒ์ธ ์ธ์›์˜ ๋‚จ/์—ฌ ๋น„์œจ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
df['๋น„๋งŒ']=df['ํ—ˆ๋ฆฌ๋‘˜๋ ˆ']/df['์‹ ์žฅ(5Cm๋‹จ์œ„)']
data = df.loc[df['๋น„๋งŒ']>=0.58].์„ฑ๋ณ„์ฝ”๋“œ.value_counts()
answer = data['M']/data['F']
print(answer)
1.1693877551020408

์ž๋™์ฐจ ๋ณดํ—˜๊ฐ€์ž… ์˜ˆ์ธก๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์ž๋™์ฐจ ๋ณดํ—˜ ๊ฐ€์ž… ์˜ˆ์ธก
Data url : https://raw.githubusercontent.com/Datamanim/datarepo/main/insurance/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/insurance/train.csv")
df.head()
id Gender Age Driving_License Region_Code Previously_Insured Vehicle_Age Vehicle_Damage Annual_Premium Policy_Sales_Channel Vintage Response
0 188957 Female 25 1 17.0 1 < 1 Year No 38785.0 152.0 59 0
1 275631 Male 20 1 22.0 0 < 1 Year No 2630.0 160.0 113 0
2 329036 Female 40 1 41.0 0 1-2 Year Yes 33165.0 124.0 188 0
3 227288 Female 28 1 30.0 1 < 1 Year No 29038.0 152.0 233 0
4 300441 Female 50 1 28.0 0 1-2 Year Yes 31325.0 124.0 258 0

Question 88

Vehicle_Age ๊ฐ’์ด 2๋…„ ์ด์ƒ์ธ ์‚ฌ๋žŒ๋“ค๋งŒ ํ•„ํ„ฐ๋ง ํ•˜๊ณ  ๊ทธ์ค‘์—์„œ
Annual_Premium ๊ฐ’์ด ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์ค‘๊ฐ„๊ฐ’ ์ด์ƒ์ธ ์‚ฌ๋žŒ๋“ค์„ ์ฐพ๊ณ , ๊ทธ๋“ค์˜ Vintage๊ฐ’์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer = df[(df['Vehicle_Age']=='> 2 Years') & (df['Annual_Premium'] >= df['Annual_Premium'].median())]['Vintage'].mean()
print(answer)
154.43647182359118

Question 89

vehicle_age์— ๋”ฐ๋ฅธ ๊ฐ ์„ฑ๋ณ„(gender)๊ทธ๋ฃน์˜ Annual_Premium๊ฐ’์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ ์•„๋ž˜ ํ…Œ์ด๋ธ”๊ณผ ๋™์ผํ•˜๊ฒŒ ๊ตฌํ˜„ํ•˜๋ผ

Hide code cell source
meandf = df.groupby(['Gender','Vehicle_Age'],as_index = False)['Annual_Premium'].mean()
meandf.pivot(index='Vehicle_Age',columns='Gender',values='Annual_Premium')
Gender Female Male
Vehicle_Age
1-2 Year 30762.245001 30413.088469
< 1 Year 29972.286702 30310.982212
> 2 Years 36108.366374 35303.870627

ํ•ธ๋“œํฐ ๊ฐ€๊ฒฉ ์˜ˆ์ธก๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/iabhishekofficial/mobile-price-classification?select=train.csv(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ํ•ธ๋“œํฐ ๊ฐ€๊ฒฉ์˜ˆ์ธก (price_range์ปฌ๋Ÿผ 0(์ €๋ ด) ~3(๋งค์šฐ๋น„์Œˆ) ๋ฒ”์œ„ )
Data url https://raw.githubusercontent.com/Datamanim/datarepo/main/mobile/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/mobile/train.csv")
df.head()
battery_power blue clock_speed dual_sim fc four_g int_memory m_dep mobile_wt n_cores ... px_height px_width ram sc_h sc_w talk_time three_g touch_screen wifi price_range
0 842 0 2.2 0 1 0 7 0.6 188 2 ... 20 756 2549 9 7 19 0 0 1 1
1 1021 1 0.5 1 0 1 53 0.7 136 3 ... 905 1988 2631 17 3 7 1 1 0 2
2 563 1 0.5 1 2 1 41 0.9 145 5 ... 1263 1716 2603 11 2 9 1 1 0 2
3 615 1 2.5 0 0 0 10 0.8 131 6 ... 1216 1786 2769 16 8 11 1 0 0 2
4 1821 1 1.2 0 13 1 44 0.6 141 2 ... 1208 1212 1411 8 2 15 1 1 0 1

5 rows ร— 21 columns

Question 90

price_range ์˜ ๊ฐ value๋ฅผ ๊ทธ๋ฃนํ•‘ํ•˜์—ฌ ๊ฐ ๊ทธ๋ฃน์˜ n_cores ์˜ ๋นˆ๋„๊ฐ€ ๊ฐ€์žฅ๋†’์€ value์™€ ๊ทธ ๋นˆ๋„์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer =df[['price_range','n_cores']].groupby(['price_range','n_cores']).size().sort_values(0).groupby(level=0).tail(1)
answer
/var/folders/0n/sq2gtx6s5qldt4nt05wqc3380000gn/T/ipykernel_3974/413588440.py:1: FutureWarning: In a future version of pandas all arguments of Series.sort_values will be keyword-only
  answer =df[['price_range','n_cores']].groupby(['price_range','n_cores']).size().sort_values(0).groupby(level=0).tail(1)
price_range  n_cores
0            2          69
3            5          70
2            4          73
1            1          76
dtype: int64

Question 91

price_range ๊ฐ’์ด 3์ธ ๊ทธ๋ฃน์—์„œ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ 2๋ฒˆ์งธ๋กœ ๋†’์€ ๋‘ ์ปฌ๋Ÿผ๊ณผ ๊ทธ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
cordf = df.loc[df['price_range']==3].corr().unstack().sort_values(ascending=False)
answer  = cordf.loc[cordf!=1].reset_index().iloc[1]
print(answer)
level_0          pc
level_1          fc
0          0.635166
Name: 1, dtype: object

๋น„ํ–‰ํƒ‘์Šน ๊ฒฝํ—˜ ๋งŒ์กฑ๋„ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/teejmahal20/airline-passenger-satisfaction?select=train.csv (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋น„ํ–‰ํƒ‘์Šน ๊ฒฝํ—˜ ๋งŒ์กฑ๋„ (satisfaction ์ปฌ๋Ÿผ : โ€˜neutral or dissatisfiedโ€™ or satisfied ) (83123, 24) shape
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/airline/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/airline/train.csv")
df.head()
id Gender Customer Type Age Type of Travel Class Flight Distance Inflight wifi service Departure/Arrival time convenient Ease of Online booking ... Inflight entertainment On-board service Leg room service Baggage handling Checkin service Inflight service Cleanliness Departure Delay in Minutes Arrival Delay in Minutes satisfaction
0 55882 Male Loyal Customer 42 Business travel Business 473 1 1 3 ... 5 5 5 5 3 5 3 3 0.0 satisfied
1 31906 Female Loyal Customer 31 Business travel Business 3271 4 4 4 ... 3 3 5 5 3 4 3 1 5.0 satisfied
2 85481 Male Loyal Customer 51 Personal Travel Eco 214 5 0 5 ... 3 4 4 4 4 4 3 2 0.0 satisfied
3 102134 Male Loyal Customer 44 Business travel Business 3097 1 1 3 ... 4 4 4 4 3 4 5 0 0.0 satisfied
4 31580 Male Loyal Customer 33 Business travel Business 3286 1 1 1 ... 4 2 5 2 5 1 4 0 0.0 satisfied

5 rows ร— 24 columns

Question 92

Arrival Delay in Minutes ์ปฌ๋Ÿผ์ด ๊ฒฐ์ธก์น˜์ธ ๋ฐ์ดํ„ฐ๋“ค ์ค‘ โ€˜neutral or dissatisfiedโ€™ ๋ณด๋‹ค โ€˜satisfiedโ€™์˜ ์ˆ˜๊ฐ€ ๋” ๋†’์€ Class๋Š” ์–ด๋”” ์ธ๊ฐ€?

Hide code cell source
answer =df.loc[df['Arrival Delay in Minutes'].isnull()].groupby(['Class','satisfaction'],as_index=False).size().pivot(index='Class',columns='satisfaction')
result =answer[answer['size']['neutral or dissatisfied'] < answer['size']['satisfied']]
result
size
satisfaction neutral or dissatisfied satisfied
Class
Business 36 76

์ˆ˜์งˆ ์Œ์šฉ์„ฑ ์—ฌ๋ถ€ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/adityakadiwal/water-potability
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์ˆ˜์งˆ ์Œ์šฉ์„ฑ ์—ฌ๋ถ€ (Potablillity ์ปฌ๋Ÿผ : 0 ,1 )
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/waters/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/waters/train.csv")
df.head()
ph Hardness Solids Chloramines Sulfate Conductivity Organic_carbon Trihalomethanes Turbidity Potability
0 7.918150 214.186611 23823.492888 7.290878 341.173322 411.424483 19.585002 25.057375 4.028958 0
1 5.422446 205.266080 18542.957451 5.491963 306.702227 382.080129 10.504023 67.493450 2.911751 1
2 7.341547 187.672402 21273.457066 7.784003 NaN 332.084293 16.842334 55.019151 4.025644 0
3 9.056245 197.666301 17403.532167 7.688917 337.460176 414.766631 15.349869 63.696746 3.319354 0
4 5.039374 142.860598 40829.353167 7.271543 NaN 386.803057 16.823773 52.297113 4.957420 0

Question 93

ph๊ฐ’์€ ์ƒ๋‹นํžˆ ๋งŽ์€ ๊ฒฐ์ธก์น˜๋ฅผ ํฌํ•จํ•œ๋‹ค. ๊ฒฐ์ธก์น˜๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ๋“ค ์ค‘ ์‚ฌ๋ถ„์œ„๊ฐ’ ๊ธฐ์ค€ ํ•˜์œ„ 25%์˜ ๊ฐ’๋“ค์˜ ํ‰๊ท ๊ฐ’์€?

Hide code cell source
target = df['ph'].dropna()
answer =target.loc[target <= target.quantile(0.25)].mean()
print(answer)
5.057093462441732

์˜๋ฃŒ ๋น„์šฉ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/mirichoi0218/insurance/code(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์˜๋ฃŒ๋น„์šฉ ์˜ˆ์ธก๋ฌธ์ œ
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/MedicalCost/train.csv

import pandas as pd
train = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/MedicalCost/train.csv")
train.head()
age sex bmi children smoker region charges
0 64 female 39.330 0 no northeast 14901.51670
1 47 female 27.830 0 yes southeast 23065.42070
2 52 female 33.300 2 no southwest 10806.83900
3 33 female 22.135 1 no northeast 5354.07465
4 30 male 31.400 1 no southwest 3659.34600

Question 94

ํก์—ฐ์ž์™€ ๋น„ํก์—ฐ์ž ๊ฐ๊ฐ charges์˜ ์ƒ์œ„ 10% ๊ทธ๋ฃน์˜ ํ‰๊ท ์˜ ์ฐจ์ด๋Š”?

Hide code cell source
high = train.loc[train.smoker =='yes'].charges.quantile(0.9)
high2 = train.loc[train.smoker =='no'].charges.quantile(0.9)
mean_yes = train.loc[(train.smoker =='yes') &(train.charges >=high)].charges.mean()
mean_no = train.loc[(train.smoker =='no') &(train.charges >=high2)].charges.mean()
answer = mean_yes - mean_no
print(answer)
29297.954548156158

ํ‚น์นด์šดํ‹ฐ ์ฃผ๊ฑฐ์ง€ ๊ฐ€๊ฒฉ์˜ˆ์ธก๋ฌธ์ œ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/harlfoxem/housesalesprediction (์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ํ‚น์นด์šดํ‹ฐ ์ฃผ๊ฑฐ์ง€ ๊ฐ€๊ฒฉ ์˜ˆ์ธก๋ฌธ์ œ
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/kingcountyprice/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/kingcountyprice//train.csv")
df.head()
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
0 8961990160 20150413T000000 567500.0 3 2.5 2080 4556 2.0 0 0 ... 8 2080 0 1999 0 98074 47.6036 -122.014 1530 5606
1 9455200205 20140604T000000 525000.0 3 2.0 1540 7800 1.0 0 0 ... 8 1540 0 2004 0 98125 47.7041 -122.288 1510 7800
2 7853220670 20140918T000000 540000.0 3 2.5 2860 8935 2.0 0 0 ... 8 2860 0 2004 0 98065 47.5336 -121.855 2650 6167
3 3298201170 20141110T000000 350000.0 3 1.0 940 7811 1.0 0 0 ... 6 940 0 1959 0 98008 47.6195 -122.118 1180 7490
4 7972604355 20140521T000000 218000.0 3 1.0 1020 7874 1.0 0 0 ... 7 1020 0 1956 0 98106 47.5175 -122.346 1290 7320

5 rows ร— 21 columns

Question 95

bedrooms ์˜ ๋นˆ๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ price์˜ ์ƒ์œ„ 10%์™€ ํ•˜์œ„ 10%๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer = df.loc[df.bedrooms ==df.bedrooms.value_counts().index[0]].price.quantile(0.9) \
-\
df.loc[df.bedrooms ==df.bedrooms.value_counts().index[0]].price.quantile(0.1)
print(answer)
505500.0

๋Œ€ํ•™์› ์ž…ํ•™๊ฐ€๋Šฅ์„ฑ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/mohansacharya/graduate-admissions(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋Œ€ํ•™์› ์ž…ํ•™ ๊ฐ€๋Šฅ์„ฑ ์˜ˆ์ธก
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/admission/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/admission/train.csv")
df.head()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
0 246 328 110 4 4.0 2.5 9.02 1 0.81
1 100 323 113 3 4.0 4.0 8.88 1 0.79
2 79 296 95 2 3.0 2.0 7.54 1 0.44
3 53 334 116 4 4.0 3.0 8.00 1 0.78
4 444 321 114 5 4.5 4.5 9.16 1 0.87

Question 96

Serial No. ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•˜๊ณ  โ€˜Chance of Admitโ€™์„ ์ข…์†๋ณ€์ˆ˜, ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋ฅผ ๋…๋ฆฝ๋ณ€์ˆ˜๋ผ ํ• ๋•Œ, ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋ฅผ ํ†ตํ•ด ํšŒ๊ท€ ์˜ˆ์ธก์„ ํ•  ๋–„ ๋ณ€์ˆ˜์ค‘์š”๋„ ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ผ (์‹œ๋“œ๊ฐ’์— ๋”ฐ๋ผ ์ˆœ์„œ๋Š” ๋‹ฌ๋ผ์งˆ์ˆ˜ ์žˆ์Œ)

Hide code cell source
from sklearn.ensemble import RandomForestRegressor

df_t = df.drop([df.columns[0]],axis=1)
x = df_t.drop([df.columns[-1]],axis=1)
y = df_t[df.columns[-1]]

ml = RandomForestRegressor()

ml.fit(x,y)

result=pd.DataFrame({'importance':ml.feature_importances_},x.columns).sort_values('importance',ascending=False)
display(result)
importance
CGPA 0.820192
GRE Score 0.080724
TOEFL Score 0.032414
SOP 0.023649
LOR 0.022733
University Rating 0.013037
Research 0.007252

๋ ˆ๋“œ ์™€์ธ ํ€„๋ฆฌํ‹ฐ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋ ˆ๋“œ ์™€์ธ ํ€„๋ฆฌํ‹ฐ ์˜ˆ์ธก๋ฌธ์ œ
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/redwine/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/redwine/train.csv")
df.head()
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 5.0 0.38 0.01 1.6 0.048 26.0 60.0 0.99084 3.70 0.75 14.0 6
1 5.0 0.42 0.24 2.0 0.060 19.0 50.0 0.99170 3.72 0.74 14.0 8
2 7.1 0.36 0.30 1.6 0.080 35.0 70.0 0.99693 3.44 0.50 9.4 5
3 7.6 0.29 0.49 2.7 0.092 25.0 60.0 0.99710 3.31 0.61 10.1 6
4 7.7 0.51 0.28 2.1 0.087 23.0 54.0 0.99800 3.42 0.74 9.2 5

Question 97

quality ๊ฐ’์ด 3์ธ ๊ทธ๋ฃน๊ณผ 8์ธ ๋ฐ์ดํ„ฐ๊ทธ๋ฃน์˜ ๊ฐ ์ปฌ๋Ÿผ๋ณ„ ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ• ๋•Œ ๊ทธ๊ฐ’์ด ๊ฐ€์žฅ ํฐ ์ปฌ๋Ÿผ๋ช…์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer = (df.loc[df.quality ==8].std() -df.loc[df.quality ==3].std()).sort_values().index[-1]
print(answer)
total sulfur dioxide

์•ฝ๋ฌผ ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/prathamtripathi/drug-classification(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ํˆฌ์•ฝํ•˜๋Š” ์•ฝ์„ ๋ถ„๋ฅ˜
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/drug/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/drug/train.csv")
df.head()
Age Sex BP Cholesterol Na_to_K Drug
0 20 F NORMAL NORMAL 9.281 4
1 24 M HIGH NORMAL 9.475 1
2 34 M NORMAL HIGH 22.456 0
3 17 M NORMAL NORMAL 10.832 4
4 72 M LOW HIGH 16.310 0

Question 98

๋‚จ์„ฑ๋“ค์˜ ์—ฐ๋ น๋Œ€๋ณ„ (10์‚ด์”ฉ ๊ตฌ๋ถ„ 0~9์„ธ 10~19์„ธ โ€ฆ) Na_to_K๊ฐ’์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•ด์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œํ˜„ํ•˜์—ฌ๋ผ

Hide code cell source
pre = df.loc[df.Sex=='M']
pre2= pre.copy()
pre2['Age2'] = pre.Age//10 *10

answer =pre2.groupby('Age2').mean()['Na_to_K'].to_frame()
display(answer)
Na_to_K
Age2
10 13.627000
20 18.260769
30 15.244143
40 13.897273
50 14.811273
60 14.284308
70 11.363857

์‚ฌ๊ธฐํšŒ์‚ฌ ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/sid321axn/audit-data(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์‚ฌ๊ธฐํšŒ์‚ฌ ๋ถ„๋ฅ˜
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/audit/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/audit/train.csv")
df.head()
Sector_score LOCATION_ID PARA_A Score_A Risk_A PARA_B Score_B Risk_B TOTAL numbers ... RiSk_E History Prob Risk_F Score Inherent_Risk CONTROL_RISK Detection_Risk Audit_Risk Risk
0 2.72 14 4.53 0.6 2.718 87.93 0.6 52.758 92.46 5.0 ... 0.4 0 0.2 0.0 4.0 108.362 0.4 0.5 21.6724 1
1 3.41 16 12.68 0.6 7.608 41.00 0.6 24.600 53.68 5.5 ... 0.4 0 0.2 0.0 4.4 41.936 0.4 0.5 8.3872 1
2 2.72 11 3.11 0.6 1.866 113.97 0.6 68.382 117.08 5.5 ... 0.4 0 0.2 0.0 4.4 88.832 0.4 0.5 17.7664 1
3 3.41 8 1.12 0.4 0.448 0.00 0.2 0.000 1.12 5.0 ... 0.4 0 0.2 0.0 2.2 1.848 0.4 0.5 0.3696 0
4 55.57 9 1.06 0.4 0.424 0.00 0.2 0.000 1.06 5.0 ... 0.4 0 0.2 0.0 2.2 1.824 0.4 0.5 0.3648 0

5 rows ร— 27 columns

Question 99

๋ฐ์ดํ„ฐ์˜ Risk ๊ฐ’์— ๋”ฐ๋ฅธ score_a์™€ score_b์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer =df.groupby('Risk').mean()[['Score_A','Score_B']]
display(answer)
Score_A Score_B
Risk
0 0.262234 0.225532
1 0.490164 0.444262

์„ผ์„œ๋ฐ์ดํ„ฐ ๋™์ž‘์œ ํ˜• ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/kyr7plus/emg-4(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ์„ผ์„œ๋ฐ์ดํ„ฐ๋กœ ๋™์ž‘ ์œ ํ˜• ๋ถ„๋ฅ˜
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/muscle/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/muscle/train.csv")
df.head()
motion_0 motion_1 motion_2 motion_3 motion_4 motion_5 motion_6 motion_7 motion_8 motion_9 ... motion_55 motion_56 motion_57 motion_58 motion_59 motion_60 motion_61 motion_62 motion_63 pose
0 -6.0 8.0 5.0 6.0 -12.0 -27.0 5.0 24.0 -8.0 -5.0 ... -1.0 -3.0 -6.0 -9.0 -7.0 36.0 68.0 3.0 -3.0 1
1 -32.0 -3.0 -4.0 -5.0 -1.0 16.0 0.0 -5.0 42.0 2.0 ... -17.0 -12.0 -1.0 -1.0 0.0 -26.0 -11.0 -4.0 -9.0 1
2 35.0 -2.0 -1.0 1.0 -25.0 -16.0 -3.0 -20.0 10.0 1.0 ... 6.0 -65.0 6.0 -10.0 -31.0 -19.0 -3.0 -47.0 -17.0 0
3 -15.0 -6.0 -3.0 -5.0 27.0 37.0 -2.0 -2.0 19.0 5.0 ... 2.0 11.0 0.0 -2.0 -4.0 -7.0 15.0 3.0 14.0 1
4 60.0 3.0 0.0 -7.0 -6.0 -10.0 -6.0 7.0 -11.0 -4.0 ... -12.0 17.0 -3.0 1.0 2.0 -2.0 20.0 18.0 10.0 0

5 rows ร— 65 columns

Question 100

pose๊ฐ’์— ๋”ฐ๋ฅธ ๊ฐ motion์ปฌ๋Ÿผ์˜ ์ค‘๊ฐ„๊ฐ’์˜ ๊ฐ€์žฅ ํฐ ์ฐจ์ด๋ฅผ ๋ณด์ด๋Š” motion์ปฌ๋Ÿผ์€ ์–ด๋””์ด๋ฉฐ ๊ทธ๊ฐ’์€?

Hide code cell source
t= df.groupby('pose').median().T
dfs = abs(t[0] - t[1]).sort_values().reset_index()
dfs[dfs[0] ==dfs[0].max()]['index'].values
array(['motion_54', 'motion_62'], dtype=object)

ํ˜„๋Œ€ ์ฐจ๋Ÿ‰ ๊ฐ€๊ฒฉ ๋ถ„๋ฅ˜๋ฌธ์ œ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/mysarahmadbhat/hyundai-used-car-listing(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ํ˜„๋Œ€ ์ฐจ๋Ÿ‰๊ฐ€๊ฒฉ ๋ถ„๋ฅ˜๋ฌธ์ œ
Data url :hhttps://raw.githubusercontent.com/Datamanim/datarepo/main/hyundai/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/hyundai/train.csv")
df.head()
model year price transmission mileage fuelType tax(ยฃ) mpg engineSize
0 Tucson 2016 12795 Manual 36263 Diesel 30 61.7 1.7
1 I10 2012 3995 Manual 22089 Petrol 20 61.4 1.2
2 I30 2019 24000 Manual 555 Petrol 145 34.0 2.0
3 Tucson 2017 12995 Manual 32147 Diesel 30 61.7 1.7
4 Tucson 2018 14991 Semi-Auto 32217 Diesel 150 57.6 1.7

Question 101

์ •๋ณด(row์ˆ˜)๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ƒ์œ„ 3์ฐจ์ข…์˜ price๊ฐ’์˜ ๊ฐ ํ‰๊ท ๊ฐ’์€?

Hide code cell source
answer =df.loc[df.model.isin(df.model.value_counts().index[:3])].groupby('model').mean()['price'].to_frame()
display(answer)
price
model
I10 7646.137891
I30 11651.821759
Tucson 15805.441373

๋‹น๋‡จ์—ฌ๋ถ€ํŒ๋‹จ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/pritsheta/diabetes-dataset(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋‹น๋‡จ์—ฌ๋ถ€ ํŒ๋‹จํ•˜๊ธฐ
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/diabetes/train.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/diabetes/train.csv")
df.head()
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 3 102 74 0 0 29.5 0.121 32 0
1 2 144 58 33 135 31.6 0.422 25 1
2 5 136 82 0 0 0.0 0.640 69 0
3 13 145 82 19 110 22.2 0.245 57 0
4 1 117 60 23 106 33.8 0.466 27 0

Question 102

Outcome ๊ฐ’์— ๋”ฐ๋ฅธ ๊ฐ ๊ทธ๋ฃน์˜ ๊ฐ ์ปฌ๋Ÿผ์˜ ํ‰๊ท  ์ฐจ์ด๋ฅผ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
answer = df.groupby('Outcome').mean().diff().iloc[1,:]
print(answer)
Pregnancies                  1.574159
Glucose                     29.943995
BloodPressure                3.202079
SkinThickness                2.952033
Insulin                     33.398645
BMI                          4.541437
DiabetesPedigreeFunction     0.143646
Age                          5.866939
Name: 1, dtype: float64

๋„ทํ”Œ๋ฆญ์Šค ์ฃผ์‹ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/pritsheta/netflix-stock-data-from-2002-to-2021(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
๋ฐ์ดํ„ฐ ์„ค๋ช… : ๋„ทํ”Œ๋ฆญ์Šค ์ฃผ์‹๋ฐ์ดํ„ฐ
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/nflx/NFLX.csv

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/nflx/NFLX.csv")
df.head()
Date Open High Low Close Adj Close Volume
0 2002-05-23 1.156429 1.242857 1.145714 1.196429 1.196429 104790000
1 2002-05-24 1.214286 1.225000 1.197143 1.210000 1.210000 11104800
2 2002-05-28 1.213571 1.232143 1.157143 1.157143 1.157143 6609400
3 2002-05-29 1.164286 1.164286 1.085714 1.103571 1.103571 6757800
4 2002-05-30 1.107857 1.107857 1.071429 1.071429 1.071429 10154200

Question 103

๋งค๋…„ 5์›”๋‹ฌ์˜ open๊ฐ€๊ฒฉ์˜ ํ‰๊ท ๊ฐ’์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œํ˜„ํ•˜๋ผ

Hide code cell source
df['Date']  =pd.to_datetime(df['Date'])

target = df.groupby(df['Date'].dt.strftime('%Y-%m')).mean()
answer = target.loc[target.index.str.contains('-05')].Open
print(answer)
Date
2002-05      1.155833
2003-05      1.641497
2004-05      4.261143
2005-05      1.951905
2006-05      4.163571
2007-05      3.159351
2008-05      4.435034
2009-05      5.790571
2010-05     14.417071
2011-05     34.650272
2012-05     10.523247
2013-05     31.936429
2014-05     51.121292
2015-05     85.057429
2016-05     92.705715
2017-05    158.255455
2018-05    329.779541
2019-05    359.664548
2020-05    433.880499
2021-05    496.923996
Name: Open, dtype: float64

220510์ถ”๊ฐ€#

nba ์„ ์ˆ˜ ๋Šฅ๋ ฅ์น˜ ๋ฐ์ดํ„ฐ#

Attention

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜ :https://www.kaggle.com/datasets/vivovinco/nba-player-stats(์ฐธ๊ณ , ๋ฐ์ดํ„ฐ ์ˆ˜์ •)
Data url :https://raw.githubusercontent.com/Datamanim/datarepo/main/nba/nba.csv

pd.set_option('display.max_columns',50)
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/datarepo/main/nba/nba.csv",encoding='latin',sep=';')
df.head()
Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Precious Achiuwa C 22 TOR 73 28 23.6 3.6 8.3 0.439 0.8 2.1 0.359 2.9 6.1 0.468 0.486 1.1 1.8 0.595 2.0 4.5 6.5 1.1 0.5 0.6 1.2 2.1 9.1
1 2 Steven Adams C 28 MEM 76 75 26.3 2.8 5.1 0.547 0.0 0.0 0.000 2.8 5.0 0.548 0.547 1.4 2.6 0.543 4.6 5.4 10.0 3.4 0.9 0.8 1.5 2.0 6.9
2 3 Bam Adebayo C 24 MIA 56 56 32.6 7.3 13.0 0.557 0.0 0.1 0.000 7.3 12.9 0.562 0.557 4.6 6.1 0.753 2.4 7.6 10.1 3.4 1.4 0.8 2.6 3.1 19.1
3 4 Santi Aldama PF 21 MEM 32 0 11.3 1.7 4.1 0.402 0.2 1.5 0.125 1.5 2.6 0.560 0.424 0.6 1.0 0.625 1.0 1.7 2.7 0.7 0.2 0.3 0.5 1.1 4.1
4 5 LaMarcus Aldridge C 36 BRK 47 12 22.3 5.4 9.7 0.550 0.3 1.0 0.304 5.1 8.8 0.578 0.566 1.9 2.2 0.873 1.6 3.9 5.5 0.9 0.3 1.0 0.9 1.7 12.9

Question 104

Tm ์ปฌ๋Ÿผ์€ ๊ฐ ํŒ€์˜ ์ด๋ฆ„์„ ์˜๋ฏธํ•œ๋‹ค. TORํŒ€์˜ ํ‰๊ท ๋‚˜์ด๋ฅผ ์†Œ์ˆ˜ 4์งธ ์ž๋ฆฌ๊นŒ์ง€ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = round(df[df.Tm =='TOR'].Age.mean(),4)
print(result)
24.8696

Question 105

Pos ์ปฌ๋Ÿผ์€ ํฌ์ง€์…˜์„ ์˜๋ฏธํ•œ๋‹ค. ์ „์ฒด ์„ ์ˆ˜ ์ค‘ ์ตœ์†Œ๋‚˜์ด๋Œ€์˜ ์„ ์ˆ˜๋“ค์„ ํ•„ํ„ฐํ•˜๊ณ  ๊ทธ๋“ค ์ค‘ ๊ฐ€์žฅ ๋งŽ์€ ํฌ์ง€์…˜์€ ๋ฌด์—‡์ธ์ง€ ํ™•์ธํ•˜๋ผ

Hide code cell source
result = df[df.Age==df.Age.min()].Pos.value_counts().index[0]
print(result)
SG

Question 106

์„ ์ˆ˜๋“ค์˜ ์ด๋ฆ„์€ first_name+ ๊ณต๋ฐฑ + last_name์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๊ฐ€์žฅ ๋งŽ์€ first_name์€ ๋ฌด์—‡์ด๋ฉฐ ๋ช‡ ํšŒ ๋ฐœ์ƒํ•˜๋Š”์ง€ ํ™•์ธํ•˜๋ผ

Hide code cell source
result= df['Player'].str.split().str[0].str.lower().value_counts().head(1)
print(result)
justin    14
Name: Player, dtype: int64

Question 107

PTS์ปฌ๋Ÿผ์€ ๊ฒฝ๊ธฐ๋‹น ํ‰๊ท ๋“์ ์ˆ˜ ์ด๋‹ค. ๊ฐํฌ์ง€์…˜๋ณ„๋กœ ๊ฒฝ๊ธฐ๋‹น ํ‰๊ท ๋“์ ์ˆ˜์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df.groupby('Pos')['PTS'].mean().sort_values()
print(result)
Pos
SF        7.270253
SG-SF     7.360000
C         7.690769
PF        7.737500
SG        7.810553
PG        8.325161
PF-SF     8.500000
SF-SG     8.660000
SG-PG     9.525000
C-PF     12.850000
PG-SG    22.000000
Name: PTS, dtype: float64

Question 108

PTS์ปฌ๋Ÿผ์€ ๊ฒฝ๊ธฐ๋‹น ํ‰๊ท ๋“์ ์ˆ˜ ์ด๋‹ค. ๊ฐํฌ์ง€์…˜๋ณ„๋กœ ๊ฒฝ๊ธฐ๋‹น ํ‰๊ท ๋“์ ์ˆ˜์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Question 109

G์ปฌ๋Ÿผ์€ ์ฐธ์—ฌํ•œ ๊ฒฝ๊ธฐ์˜ ์ˆซ์ž์ด๋‹ค. ๊ฐ ํŒ€๋ณ„๋กœ ๊ฐ€์žฅ ๋†’์€ ๊ฒฝ๊ธฐ์ฐธ์—ฌ ์ˆ˜๋ฅผ ๊ฐ€์ง„ ์„ ์ˆ˜๋“ค์˜ ๊ฒฝ๊ธฐ ์ฐธ์—ฌ ์ˆซ์ž์˜ ํ‰๊ท ์„ ๊ตฌํ•˜์—ฌ๋ผ

Hide code cell source
result = df.sort_values(['Tm','G']).groupby('Tm').tail(1).G.mean()
print(result)
77.51612903225806

Question 110

Tm์˜ ๊ฐ’์ด MIA์ด๋ฉฐ Pos๋Š” C๋˜๋Š” PF์ธ ์„ ์ˆ˜์˜ MP๊ฐ’์˜ ํ‰๊ท ์€?

Hide code cell source
result = df[(df.Tm =='MIA') & (df.Pos.isin(['C','PF']))].MP.mean()
print(result)
16.7875

Question 111

์ „์ฒด ๋ฐ์ดํ„ฐ์ค‘ G์˜ ํ‰๊ท ๊ฐ’์˜ 1.5๋ฐฐ ์ด์ƒ์ธ ๋ฐ์ดํ„ฐ๋“ค๋งŒ ์ถ”์ถœํ–ˆ์„๋•Œ 3P๊ฐ’์˜ ํ‰๊ท ์€?

Hide code cell source
result =  df[df.G >= df.G.mean()*1.5]['3P'].mean()
print(result)
1.3853658536585365

Question 112

Age์˜ ํ‰๊ท  ์ด์ƒ์ธ ๊ทธ๋ฃน๊ณผ ํ‰๊ท  ๋ฏธ๋งŒ์ธ ๊ทธ๋ฃน๊ฐ„์˜ G๊ฐ’์˜ ํ‰๊ท ์˜ ์ฐจ์ด๋Š”?

Hide code cell source
result = df[df.Age >= df.Age.mean()].G.mean() - df[df.Age < df.Age.mean()].G.mean()
print(result)
3.787674551781862

Question 113

ํ‰๊ท ๋‚˜์ด๊ฐ€ ๊ฐ€์žฅ ์ Š์€ ํŒ€์€ ์–ด๋””์ธ๊ฐ€

result = df.groupby('Tm')['Age'].mean().sort_values().index[0]
print(result)
MEM

Question 114

Pos๊ทธ๋ฃน๋ณ„ ํ‰๊ท  MP๊ฐ’์„ ๊ตฌํ•˜์—ฌ๋ผ

result = df.groupby('Pos')['MP'].mean()
print(result)
Pos
C        16.990000
C-PF     25.350000
PF       17.937500
PF-SF    27.300000
PG       19.547742
PG-SG    37.200000
SF       17.514557
SF-SG    20.340000
SG       18.554271
SG-PG    22.950000
SG-SF    19.620000
Name: MP, dtype: float64