[data analysis] 정규화(Normalization)

Data의 정규화(Normalization)

개요
L₁ Normalization
L₂ Normalization

개요

각 샘플들이 단위 벡터가 되도록 하는 스케일 과정입니다. 즉, 행과 열로 이루어진 데이터 구조에서 표준화가 각 열 단위로 스케일을 조정(각 변수별 조정)한다면 정규화는 동일한 행에 위치한 데이터들의 스케일을 조정합니다(샘플별 조정).

sklearn.preprocessing.normalizer(x, norm='l2') 함수를 사용합니다. 이 함수의 매개변수 norm의 인자로 'l2'이면 L₂정규화, 'l1'이면 L₁정규화를 실행합니다.

sklearn.preprocession.Normalizer(norm="l2")

데이터의 각 샘플이 단위벡터가 되도록 변환
인수 norm에 l1, l2 또는 max
- norm = "l1" → L₁ 정규화, x/(∑|x|)
- norm = "l2" → L₂ 정규화, x/(∑x²)^0.5
- norm = "max" → x/(max|x|)
- 기본값은 l2

L₁ Normalization

각 행에서 절대 값의 합이 항상 1이되는 방식으로 데이터 세트 값을 수정하는 방법으로 최소 절대 편차라고도합니다(식 1).

$$\begin{align} \text{벡터}\qquad \qquad& \qquad \qquad \text{L}_1\; \text{정규화에 의한 벡터}\\ \begin{matrix} x_{11}&x_{12}& \cdots &x_{1p}\\ x_{21}&x_{22}& \cdots &x_{2p}\\ \vdots&\vdots& \ddots &\vdots\\ x_{n1}&x_{n2}& \cdots &x_{np}\\ \end{matrix} &\Rightarrow \begin{matrix} \frac{x_{11}}{\sum^p_{i=1}x_{1i}}&\frac{x_{12}}{\sum^p_{i=1}x_{1i} }& \cdots &\frac{x_{1p}}{ \sum^p_{i=1}x_{1i}}\\ \frac{x_{21}}{\sum^p_{i=1}x_{2i} }&\frac{x_{22}}{ \sum^p_{i=1}x_{2i}}& \cdots &\frac{x_{2p}}{\sum^p_{i=1}x_{2i} }\\ \vdots&\vdots& \ddots &\vdots\\ \frac{x_{n1}}{\sum^p_{i=1}x_{ni} }&\frac{x_{n2}}{\sum^p_{i=1}x_{ni} }& \cdots &\frac{x_{np}}{\sum^p_{i=1}x_{ni} }\\ \end{matrix} \end{align}$$

(식 1)

import numpy as np 
from sklearn import preprocessing

np.random.seed(1)
x=np.random.randint(-100, 100, size=(5,3))
print(x)

[[-63  40 -28]
 [ 37  33 -21]
 [ 92  44  29]
 [-29  34 -75]
 [ 78 -80   1]]

xNorm1=preprocessing.normalize(x, norm='l1')
print(np.around(xNorm1, 3))

[[-0.481  0.305 -0.214]
 [ 0.407  0.363 -0.231]
 [ 0.558  0.267  0.176]
 [-0.21   0.246 -0.543]
 [ 0.491 -0.503  0.006]]

print(np.sum(np.abs(xNorm1), axis=1))

[1. 1. 1. 1. 1.]

xNorm12=x/np.sum(np.abs(x), axis=1).reshape(-1,1) #By 식 1
print(np.around(xNorm12, 3))

[[-0.481  0.305 -0.214]
 [ 0.407  0.363 -0.231]
 [ 0.558  0.267  0.176]
 [-0.21   0.246 -0.543]
 [ 0.491 -0.503  0.006]]

L₂ Normalization

식 2와 같이 각 행의 제곱합의 제곱근이 항상 1이 되는 방식으로 데이터를 수정합니다.

\begin{align} \text{벡터}\qquad \qquad& \qquad \qquad \text{L}_2\; \text{정규화에 의한 벡터}\\ \begin{matrix} x_{11}&x_{12}& \cdots &x_{1p}\\ x_{21}&x_{22}& \cdots &x_{2p}\\ \vdots&\vdots& \ddots &\vdots\\ x_{n1}&x_{n2}& \cdots &x_{np}\\ \end{matrix} &\Rightarrow \begin{matrix} \frac{x_{11}}{\sqrt{\sum^p_{i=1}x^2_{1i}}}&\frac{x_{12}}{\sqrt{\sum^p_{i=1}x^2_{1i}} }& \cdots &\frac{x_{1p}}{ \sqrt{\sum^p_{i=1}x^2_{1i}}}\\ \frac{x_{21}}{\sqrt{\sum^p_{i=1}x^2_{2i}} }&\frac{x_{22}}{ \sqrt{\sum^p_{i=1}x^2_{2i}}}& \cdots &\frac{x_{2p}}{\sqrt{\sum^p_{i=1}x^2_{2i}} }\\ \vdots&\vdots& \ddots &\vdots\\ \frac{x_{n1}}{\sqrt{\sum^p_{i=1}x^2_{ni}} }&\frac{x_{n2}}{\sqrt{\sum^p_{i=1}x^2_{ni}} }& \cdots &\frac{x_{np}}{\sqrt{\sum^p_{i=1}x^2_{ni}} }\\ \end{matrix} \end{align}

(식 2)

xNorm2=preprocessing.normalize(x, norm='l2')
print(np.around(xNorm2, 3))

[[-0.79   0.502 -0.351]
 [ 0.687  0.613 -0.39 ]
 [ 0.868  0.415  0.274]
 [-0.332  0.389 -0.859]
 [ 0.698 -0.716  0.009]]

xNorm22=x/np.sqrt(np.sum(x**2, axis=1)).reshape(-1,1)
print(np.around(xNorm22, 3))

[[-0.79   0.502 -0.351]
 [ 0.687  0.613 -0.39 ]
 [ 0.868  0.415  0.274]
 [-0.332  0.389 -0.859]
 [ 0.698 -0.716  0.009]]

sons dataStory

이 블로그 검색

[matplotlib]quiver()함수

[data analysis] 정규화(Normalization)

Data의 정규화(Normalization)

개요

sklearn.preprocession.Normalizer(norm="l2")

L₁ Normalization

L₂ Normalization

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

[Linear Algebra] 유사변환(Similarity transformation)

[sympy] Sympy객체의 표현을 위한 함수들

sympy.solvers로 방정식해 구하기

sons dataStory

[matplotlib]quiver()함수

[data analysis] 정규화(Normalization)

Data의 정규화(Normalization)

개요

sklearn.preprocession.Normalizer(norm="l2")

L1 Normalization

L2 Normalization

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

[Linear Algebra] 유사변환(Similarity transformation)

[sympy] Sympy객체의 표현을 위한 함수들

sympy.solvers로 방정식해 구하기

L₁ Normalization

L₂ Normalization