程式記錄: sklearn.preprocessing.Imputer 使用

sklearn.preprocessing.Imputer

計算缺失值

以三種策略來計算
1. 平均值
2. 中位數
3. 眾數

替換掉名為 NaN 的字串

可以選擇 column 或 row 計算
verbose 不清楚作用

預設平均值
(5+4+3+10)/4 = 5.5

code:

import pandas as pd
import numpy as np

from sklearn.preprocessing import Imputer

data = [["NaN" , 2] , [5 , 6] , [4 , 8] , ["NaN" , 9] , [3,3] , [10,6]]

# missing_values 要被替換的值預設 "NaN"
# strategy 替換策略 mean 平均值 median 中位數 most_frequent 眾數
# axis 預設 0 0=> 直的一組 1 => 橫的一組
# verbose 作用不明
# statistics_ 參數代表要替換的數值為多少
imp = Imputer()
imp.fit(data)

print("ONE:")
print("Imputer statistics_ :\n" , imp.statistics_)
print(imp.transform(data))

print("TWO:")
imp = Imputer(strategy ="median")

print(imp.fit_transform(data))

print("THREE:")
data2 = [[1,2,"Nan",3,48]]

imp = Imputer(strategy ="median" , axis = 1)

print(imp.fit_transform(data2))

output:

ONE:
Imputer statistics_ :
[5.5        5.66666667]
[[ 5.5 2. ]
[ 5.   6. ]
[ 4.   8. ]
[ 5.5 9. ]
[ 3.   3. ]
[10.   6. ]]
TWO:
[[ 4.5 2. ]
[ 5.   6. ]
[ 4.   8. ]
[ 4.5 9. ]
[ 3.   3. ]
[10.   6. ]]
THREE:
[[ 1.   2.   2.5 3. 48. ]]

source code

程式記錄

2018年5月2日星期三

sklearn.preprocessing.Imputer 使用

沒有留言:

張貼留言

搜尋此網誌

網誌存檔

2018年5月2日 星期三

sklearn.preprocessing.Imputer 使用

沒有留言:

張貼留言

2018年5月2日星期三