표준화(Standardization)

평균과 표준편차가 각각 0과 1인 표준정규분포를 따르는 데이터로 전환하는 것으로 회귀분석, 로지스틱회귀분석과 같은 알고리즘에 유용합니다. 식 1과 같이 계산됩니다.

$$x_\text{std}=\frac{x-\mu}{\sigma}$$

(식 1)

sklearn.preprocessing.StandardScaler()클래스와 zscore() 함수를 적용합니다.

import numpy as np 
import pandas as pd 
from sklearn import preprocessing
from scipy import stats

np.random.seed(0)
x=np.random.randint(0, 100, size=(5,3))
print(x)

[[44 47 64]
 [67 67  9]
 [83 21 36]
 [87 70 88]
 [88 12 58]]

xStScaler=preprocessing.StandardScaler().fit(x)
xScale3=xStScaler.transform(x)
print(np.around(xScale3, 3))

[[-1.784  0.153  0.486]
 [-0.407  1.004 -1.57 ]
 [ 0.551 -0.953 -0.561]
 [ 0.79   1.131  1.384]
 [ 0.85  -1.335  0.262]]

mu=np.mean(x, axis=0)
sd=np.std(x, axis=0)
print(np.around((x-mu)/sd,3))

[[-1.784  0.153  0.486]
 [-0.407  1.004 -1.57 ]
 [ 0.551 -0.953 -0.561]
 [ 0.79   1.131  1.384]
 [ 0.85  -1.335  0.262]]

score=stats.zscore(x)
print(score.round(3))

[[-1.784  0.153  0.486]
 [-0.407  1.004 -1.57 ]
 [ 0.551 -0.953 -0.561]
 [ 0.79   1.131  1.384]
 [ 0.85  -1.335  0.262]]

sons dataStory