서울시 모기발생상황 지표 예측#
데이터 출처 :
https://data.kma.go.kr/stcs/grnd/grndTaList.do?pgmNo=70 (기상청)
https://data.seoul.go.kr/dataList/16/literacyView.do (서울공공데이터포털)
Attention
2016년~ 2019년까지의 일별 모기지수 데이터를 온도,강수량 데이터를 통해 예측해본다.
평가지표는 r2 score
DataLoad
데이터 로드
import pandas as pd
train_x =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/train_x.csv',encoding='euc-kr')
train_y =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/train_y.csv',encoding='euc-kr')
test_x =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/test_x.csv',encoding='euc-kr')
sub =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/sub.csv')
DATA
데이터셋 확인
train_x.head()
date | 강수량(mm) | 평균기온(℃) | 최저기온(℃) | 최고기온(℃) | |
---|---|---|---|---|---|
0 | 2019-12-31 | 0.0 | -7.9 | -10.9 | -4.5 |
1 | 2019-12-30 | 0.4 | 2.7 | -5.7 | 6.8 |
2 | 2019-12-29 | 1.4 | 3.8 | 1.1 | 6.2 |
3 | 2019-12-27 | 0.0 | -1.7 | -4.6 | 2.6 |
4 | 2019-12-25 | 0.0 | 2.0 | -2.7 | 6.6 |
train_y.head()
date | mosquito_ratio | |
---|---|---|
0 | 2019-12-31 | 5.5 |
1 | 2019-12-30 | 5.5 |
2 | 2019-12-29 | 5.5 |
3 | 2019-12-27 | 5.5 |
4 | 2019-12-25 | 5.5 |
baseLine
베이스라인 코드입니다.
Show code cell source
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRFRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
def preprocessing(df):
df['date'] = pd.to_datetime(df['date'])
df['year']= df['date'].dt.year
df['month']= df['date'].dt.month
df=df.drop(['date'],axis=1)
return df
x = preprocessing(train_x)
y = train_y.drop(['date'],axis=1)
rf = RandomForestRegressor(random_state =12)
xg =XGBRFRegressor(random_state =12)
xtr ,xt, ytr,yt = train_test_split(x,y,test_size=0.3,random_state=24)
rf.fit(xtr,ytr.values.ravel())
xg.fit(xtr,ytr)
pred= rf.predict(xt)
predxg= xg.predict(xt)
Ans = 'randomforest r2 : '+str(r2_score(yt,pred))+' \nxgboost r2 : '+str(r2_score(yt,predxg))
subDF = preprocessing(test_x)
pred = (rf.predict(subDF) + xg.predict(subDF))/2
sub['mosquito_ratio'] = pred
sub.to_csv('submission.csv',index=False)
print(Ans)
randomforest r2 : 0.8477788464778293
xgboost r2 : 0.8494664636000008
Tip
제출코드 결과확인
Show code cell source
def FinalMseScore():
import pandas as pd
y_true = pd.read_csv("https://raw.githubusercontent.com/Datamanim/mosquito/main/result.csv")
sub = pd.read_csv('./submission.csv')
pred = sub.iloc[:,-1].values
from sklearn.metrics import r2_score
mse = r2_score(y_true['mosquito_ratio'],pred)
print('submission mse score : ',mse)
return mse
final_mse = FinalMseScore()
submission mse score : 0.8800627717083699