4회 기출 변형

4회 기출 변형#

작업 3유형 통계 반영

광고 눌러주세요~!

작업 1유형#

Attention

DataUrl = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_1.csv 월마트 판매량 데이터 : https://www.kaggle.com/datasets/asahu40/walmart-data-analysis-and-forcasting

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_1.csv')
df.head(5)

	Store	Date	Weekly_Sales	Holiday_Flag	Temperature	Fuel_Price	CPI	Unemployment
0	1	05-02-2010	1643690.90	0	42.31	2.572	211.096358	8.106
1	1	12-02-2010	1641957.44	1	38.51	2.548	211.242170	8.106
2	1	19-02-2010	1611968.17	0	39.93	2.514	211.289143	8.106
3	1	26-02-2010	1409727.59	0	46.63	2.561	211.319643	8.106
4	1	05-03-2010	1554806.68	0	46.5	2.625	211.350143	8.106

1-1

Temperature컬럼에서 숫자가 아닌 문자들을 제거후 숫자 타입으로 바꾸고 3분위수에서 1분위수의 차이를 소숫점 이하 2자리까지 구하여라

27.48

Attention

DataUrl = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_2.csv 유튜브 영상 통계량 : https://www.kaggle.com/datasets/advaypatil/youtube-statistics?select=videos-stats.csv

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_2.csv')
df.head(5)

	Title	Video ID	Published At	Keyword	Likes	Comments	Views
0	Apple Pay Is Killing the Physical Wallet After...	wAZZ-UWGVHI	2022-08-23	tech	3407.0	672.0	135612.0
1	The most EXPENSIVE thing I own.	b3x28s61q3c	2022-08-24	tech	76779.0	4306.0	1758063.0
2	My New House Gaming Setup is SICK!	4mgePWWCAmA	2022-08-23	tech	63825.0	3338.0	1564007.0
3	Petrol Vs Liquid Nitrogen \| Freezing Experimen...	kXiYSI7H2b0	2022-08-23	tech	71566.0	1426.0	922918.0
4	Best Back to School Tech 2022!	ErMwWXQxHp0	2022-08-08	tech	96513.0	5155.0	1855644.0

1-2

Likes를 Comments로 나눈 비율이 20이상이면서 Keyword값이 minecraft인 영상들의 Views값의 평균을 정수로 구하여라

Attention

DataUrl = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_3.csv 넷플릭스 영상 메타정보 : https://www.kaggle.com/datasets/akashguna/netflix-prize-shows-information

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p1_3.csv')
df.head(5)

	show_id	type	title	director	cast	country	date_added	release_year	rating	duration	listed_in	description
0	s1	Movie	Dick Johnson Is Dead	Kirsten Johnson	NaN	United States	September 25, 2021	2020	PG-13	90 min	Documentaries	As her father nears the end of his life, filmm...
1	s2	TV Show	Blood & Water	NaN	Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...	South Africa	September 24, 2021	2021	TV-MA	2 Seasons	International TV Shows, TV Dramas, TV Mysteries	After crossing paths at a party, a Cape Town t...
2	s3	TV Show	Ganglands	Julien Leclercq	Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...	NaN	September 24, 2021	2021	TV-MA	1 Season	Crime TV Shows, International TV Shows, TV Act...	To protect his family from a powerful drug lor...
3	s4	TV Show	Jailbirds New Orleans	NaN	NaN	NaN	September 24, 2021	2021	TV-MA	1 Season	Docuseries, Reality TV	Feuds, flirtations and toilet talk go down amo...
4	s5	TV Show	Kota Factory	NaN	Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...	India	September 24, 2021	2021	TV-MA	2 Seasons	International TV Shows, Romantic TV Shows, TV ...	In a city of coaching centers known to train I...

1-3

date_added가 2018년 1월 이면서 country가 United Kingdom 단독 제작인 데이터의 갯수

작업 2유형#

Attention

유저 분류 : https://www.kaggle.com/datasets/kaushiksuresh147/customer-segmentation train = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p2_train.csv test = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p2_test.csv

예측 변수 Segmentation, test.csv에 대해 ID별로 Segmentation의 클래스를 예측해서 저장후 제출, 제출 데이터 컬럼은 ID와 Segmentation 두개만 존재해야함. 평가지표는 macro f1 score

import pandas as pd
train = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p2_train.csv')
test = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p2_test.csv')

display(train.head(2))
test.head(2)

	ID	Gender	Ever_Married	Age	Graduated	Profession	Work_Experience	Spending_Score	Family_Size	Var_1	Segmentation
0	464357	Male	No	40	Yes	Artist	7.0	Low	1.0	Cat_6	A
1	459624	Male	No	18	No	Healthcare	NaN	Low	5.0	Cat_4	D

	ID	Gender	Ever_Married	Age	Graduated	Profession	Work_Experience	Spending_Score	Family_Size	Var_1
0	460406	Male	Yes	36	Yes	Healthcare	3.0	Low	2.0	Cat_6
1	466890	Male	Yes	47	Yes	Artist	0.0	Average	6.0	Cat_7

valiation test f1score 0.4107784704089142

작업 3유형#

Attention

어느 대학교의 신입생의 학과별 성별에 대한 데이터이다. DataUrl = https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p3_1_.csv 이 데이터를 바탕으로, 학생들의 학과와 성별이 서로 독립적인지 여부를 확인하기 위해 카이제곱 독립성 검정을 실시 하려한다.

import pandas as pd 
df= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/krdatacertificate/e4_p3_1_.csv')
df.head()

	학번	학과	성별
0	DFSKJK_1	경영학과	여
1	DFSKJK_2	사회학과	여
2	DFSKJK_3	기계공학과	여
3	DFSKJK_4	경제학과	남
4	DFSKJK_5	기계공학과	여

3-1-a

학과 평균 인원에 대한 값을 소숫점 이하 3자리까지 구하여라

170.333

3-1-b

카이제곱검정 독립성 검정 통계량을 소숫점 이하 3자리까지 구하여라

5.646

3-1-c

카이제곱검정 독립성 검정의 pvalue를 소숫점 이하 3자리까지 구하여라. 유의수준 0.05하에서 귀무가설과 대립가설중 유의한 것을 출력하라

0.3421500573497989
귀무

Attention

어느 학교에서 수학 시험을 본 학생 100명 중 60명이 60점 이상을 받았다. 이 학교의 수학 시험의 평균 점수가 50점 이상인지 95%의 신뢰 수준에서 검정하려한다.

3-1-a

검정 통계량을 소숫점 이하 3자리에서 구하시오

2.0

3-1-b

pvalue를 소숫점 이하 3자리까지 구하고 귀무가설과 대립가설중 유의한 것을 출력하라

0.023
대립

4회 기출 변형

Contents

4회 기출 변형#

작업 1유형#

작업 2유형#

작업 3유형#