softmax 모델(Softmax Regression)

내용

Softmax 회귀 모델
비용함수
모델 생성

Softmax Regression

Softmax 회귀 모델

데이터를 두 개의 클래스로 구분하기 위한 예측 방법인 로지스틱 회귀는 2개 이상의 클래스로 분류하기 위해 softmax 방법으로 일반화 할 수 있습니다. 이 방법을 softmax 회귀 또는 다중 로지스틱 회귀(multinomial Losgistic regression)이라고 합니다.

이 모델은 우선적으로 각 인스턴스에 대해 식 1을 적용합니다. 이것은 그 인스턴스의 각 클래스에 대한 점수를 나타냅니다. 다음으로 식 2의 softmax 함수를 사용하여 각 클래스에 포함될 확률을 추정합니다.

$$\begin{equation}\tag{1}s_k(x)=x^T\theta^{(k)}\end{equation}$$

각 클래스 k는 자신의 가중치벡터 θ^(k)를 가집니다. 예를 들어 3개의 특성(독립변수)과 3개의 클래스들(A, B, C)를 가진 라벨(반응변수)에 대해 다음과 같이 각 인스턴스(샘플)의 점수를 계산할 수 있습니다.

$$\begin{align}&\text{- A or not}\\ &\begin{bmatrix} W_{A1} &W_{A2}&W_{A3} \end{bmatrix}\begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} W_{A1}x_1+W_{A2}x_2+W_{A3}x_3 \end{bmatrix}=s(x)_A \\ &\text{- B or not}\\ &\begin{bmatrix} W_{B1} &W_{B2}&W_{B3} \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} W_{B1}x_1+W_{B2}x_2+W_{B3}x_3 \end{bmatrix}=s(x)_B\\ &\text{- C or not} \\ &\begin{bmatrix} W_{C1} &W_{C2}&W_{C3} \end{bmatrix} \begin{bmatrix} x_1\\x_2\\x_3 \end{bmatrix} = \begin{bmatrix} W_{C1}x_1+W_{C2}x_2+W_{C3}x_3 \end{bmatrix}=s(x)_C\end{align}$$

위 식은 다음과 같이 하나의 식으로 결합할 수 있습니다.

$$ \begin{bmatrix} W_{ A1 } & W_{ A2 } & W_{ A3 } \\ W_{ B1 } & W_{ B2 } & W_{ B3 } \\ W_{ C1 } & W_{ C2 } & W_{ C3 } \end{bmatrix} \begin{bmatrix} x_{ 1 } \\ x_{ 2 } \\ x_{ 3 } \end{bmatrix} = \begin{bmatrix} W_{ A1 }x_{ 1 }+W_{ A2 }x_{ 2 }+W_{ A3 }x_{ 3 } \\ W_{ B1 }x_{ 1 }+W_{ B2 }x_{ 2 }+W_{ B3 }x_{ 3 } \\ W_{ C1 }x_{ 1 }+W_{ C2 }x_{ 2 }+W_{ C3 }x_{ 3 } \end{bmatrix} = \begin{bmatrix} s(x)_{ A } \\ s(x)_{ B } \\ s(x)_{ C } \end{bmatrix} $$

식 1 즉, 위의 연산으로 부터의 각 클래스의 점수를 [0,1] 사이의 값으로 변환하기 위해 식 2와 같이 각 값들을 전체의 값으로 나누어 고려합니다. 이 결과는 인스턴스가 각 클래스와 매칭될 확률을 추정합니다. 식 2는 식 1의 각 점수를 정규화시킨 값입니다.

$$\begin{equation}\tag{2}\hat{p}=\sigma(s(x))_k=\frac{\exp(s_k(x))}{\sum^K_{j=1}\exp(s_j(x))}\end{equation}$$

K: 클래스의 수
s(x):인스턴스 x에 대한 각 클래스의 점수 벡터
σ(s(x))_k: 인스턴스가 각 클래스에 포함될 확률

최종적으로 식 3과 같이 softmax 함수에 의해 계산된 확률들 중 가장 큰 값을 보이는 클래스를 선택합니다.

$$\begin{align}\tag{3}\hat{y}&=\underset{k}{\text{argmax}}\left(\sigma(s(x))_k\right)\\&=\underset{k}{\text{argmax}}\left((s(x))_k\right)\\&=\underset{k}{\text{argmax}}\left(x^T\theta^{(k)}\right)\end{align}$$

식 3의 argment()는 가장 높은 확률을 보이는 인덱스 즉, 클래스(y)를 반환합니다.

비용함수

softmax 모델은 대상 클래스에 대한 높은 확률(결과적으로 다른 클래스에 대한 낮은 확률)을 추정하는 모델을 갖는 것입니다. 식 4의 교차 엔트로피(cross entropy)라고 비용 함수(cost function, L(θ))가 최소가 되도록 합니다. 이를 위해 대상 클래스에 대한 낮은 확률을 추정할 때 모델에 페널티를 가합니다.

$$\begin{equation}\tag{4}L(\theta)=-\frac{1}{m} \sum^K_{k=1}y^{(i)}_k\log\left(\hat{p}^{(i)}_k\right)\end{equation}$$

y⁽ⁱ⁾_k는 i번째 인스턴스의 관측 확률입니다. 예를 들어 y=<0,1,0>인 경우 두 개의 예측치 $\hat{y_1},\; \hat{y_2},\; \hat{y_3}$에 대한 Cross-entropy 함수를 적용하면 다음과 같습니다.

$$\begin{align}&y=\begin{bmatrix}0\\1\\0\end{bmatrix} \; \hat{y_1}=\begin{bmatrix}1\\0\\0\end{bmatrix}\; \hat{y_2}=\begin{bmatrix}0\\1\\0\end{bmatrix} \; \hat{y_2}=\begin{bmatrix}0\\0\\1\end{bmatrix}\\\\ &L_1=\begin{bmatrix}0\\1\\0\end{bmatrix} \times -\log\left(\begin{bmatrix}1\\0\\0\end{bmatrix}\right)=\begin{bmatrix}0\\1\\0\end{bmatrix} \times \begin{bmatrix}0\\\infty\\\infty\end{bmatrix}=\begin{bmatrix}0\\\infty\\0\end{bmatrix}= \infty\\\\ &L_2=\begin{bmatrix}0\\1\\0\end{bmatrix} \times -\log\left(\begin{bmatrix}0\\1\\0\end{bmatrix}\right)=\begin{bmatrix}0\\1\\0\end{bmatrix} \times \begin{bmatrix}\infty\\0\\\infty\end{bmatrix}=\begin{bmatrix}0\\0\\0\end{bmatrix}= 0\\\\ &L_3=\begin{bmatrix}0\\1\\0\end{bmatrix} \times -\log\left(\begin{bmatrix}0\\0\\1\end{bmatrix}\right)=\begin{bmatrix}0\\1\\0\end{bmatrix} \times \begin{bmatrix}\infty\\\infty\\0\end{bmatrix}=\begin{bmatrix}0\\\infty\\0\end{bmatrix}= \infty\end{align}$$

위 결과와 같이 잘못된 예측에 의한 비용은 무한대까지 확장될 수 있지만 올바른 예측에 의한 비용은 0에 근접합니다.

비용함수의 결과를 최소화시키는 것이 궁극적인 목적이지만 식 4의 교차 엔트로피 함수의 최소화 지점을 직접적으로 확인할 수 있는 식을 유도할 수 없습니다. 대신에 경사하강법에 의해 비용에 식 5에 의해 계산된 그 변화를 반복적으로 적용하여 최소지점을 찾을 수 있습니다. 결국 이 지점에서의 가중치가 최적화된 가중치로 모델의 매개변수가 됩니다.

$$\begin{equation}\tag{5}\Delta_{\theta^{k}}L(\theta)=\frac{1}{m} \sum^m_{i=1}\left(\hat{p}^{(i)}_k -y^{(i)}_k\right)x^{(i)}\end{equation}$$

모델 생성

로지스틱 회귀와 같이 sklearn.linear_model.LogisticRegression() 클래스를 사용하여 softmax 모델을 생성합니다. 이 클래스의 매개변수 중 multi_class와 solver의 설정이 필요합니다.

solver: 데이터 셋이 작을 경우는 'liblinear', 반면에 큰 데이터 셋의 경우는 'sag'와 'saga'에 의해 속도가 개선됩니다. 'liblinear'는 l₁ penalty(l₁ 정규화)를 실행되는 모델에서는 작동되지 않습니다.(sklear의 solver에 대한 설명을 참조)
multi_class
- 옵션 'ovr'은 이진 문제에 적합합니다.
- 'multinomial'의 경우 이진분류와 다항 분류 모두에서 전체 확률분포에 적합한 최소손실을 계산합니다. 그러나 이 경우는 손실은 solver='liblinear'인 경우 'multinomial'을 사용할 수 없습니다.
- 'auto'는 데이터가 이진 또는 solver='liblinear'이면 'ovr'을 선택하고, 나머지 경우는 'multinomial'을 선택합니다.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
font1={'size':11, 'weight':'bold'}

from sklearn import datasets

iris=datasets.load_iris()
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

iris['feature_names'], iris['target_names']

(['sepal length (cm)',
      'sepal width (cm)',
      'petal length (cm)',
      'petal width (cm)'],
     array(['setosa', 'versicolor', 'virginica'], dtype='< U10'))

X=iris['data'][:,[2,3]]
y=iris['target']

from sklearn.linear_model import LogisticRegression

softmax_reg=LogisticRegression(multi_class="multinomial", solver="lbfgs").fit(X,y)
pre=softmax_reg.predict(X)
pre

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
           1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
           1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

softmax_reg.score(X, y)

0.9666666666666667

W=softmax_reg.coef_
W0=softmax_reg.intercept_
W,W0

(array([[-2.74866104, -1.16890756],
            [ 0.08356447, -0.90803047],
            [ 2.66509657,  2.07693804]]),
     array([ 11.12767979,   3.22717485, -14.35485463]))

생성된 softmax 모델에 의한 각 결정경계는 다음과 같이 나타낼 수 있습니다.

이 모델에서는 세개의 클래스에 대한 회귀선을 생성합니다. 다음 결과와 같이 첫번째와 세번째 클래스의 선에 의해 각 클래스의 구분이 가능합니다.

def col(x):
    if x==0:
        return("blue")
    elif x==1:
        return("red")
    else:
        return("green")

x=X[:,0].reshape(-1,1)
y1=(-np.dot(x,W[:,0].reshape(1,-1))-W0)/W[:,1]
plt.figure(figsize=(10, 4), dpi=100)
plt.subplot(1,2,1)
plt.scatter(X[:,0],X[:,1], s=y+5, color=[col(i) for i in y],label='data')
for i in range(y1.shape[1]):
    plt.plot(x, y1[:,i], label=i)
plt.xlabel('petal length', fontdict=font1)
plt.ylabel('petal width', fontdict=font1)
plt.subplot(1,2,2)
plt.scatter(X[:,0],X[:,1], s=y+5, color=[col(i) for i in y],label='data')
for i in range(y1.shape[1]):
    plt.plot(x, y1[:,i], label=i)
plt.legend(bbox_to_anchor=(1,1), prop=font1)
plt.xlabel('petal length', fontdict=font1)
plt.ylim(0, 3.5)
plt.show()

위 결과와 같이 클래스 2에 대응하는 회귀선은 분류에 도움이 되지 않습니다. 그러나 세개의 클래스 분류에 사용되는 것은 두 개의 경계만이 필요한 것으로 이 모델의 경우 첫번째와 세번째 회귀선이 결정 경계가 됩니다.

이 결과는 .decision_function() 메서드에 의한 점수로 확인할 수 있습니다. 이 점수는 각 데이터 포인트와 위에서 나타낸 각 클래스의 회귀선 사이의 거리를 나타낸 것으로 이들이 최소가 되는 지점이 클래스를 결정하는 경계가 됩니다.

deci=softmax_reg.decision_function(X)
idxMn=np.argmin(np.abs(deci), axis=0)
X[idxMn, :]

array([[3.7, 1. ],
           [5.7, 2.5],
           [4.2, 1.5]])

x=X[:,0].reshape(-1,1)
y1=(-np.dot(x,W[:,0].reshape(1,-1))-W0)/W[:,1]
plt.scatter(X[:,0],X[:,1], s=y+5, color=[col(i) for i in y],label='data')
col1=['blue', 'red', 'green']
n=0
for i,j in X[idxMn, :]:
    plt.scatter(i, j, s=100, color="none", edgecolor=col1[n], label=i)
    n +=1
plt.legend(bbox_to_anchor=(1,1), prop=font1)
plt.ylim(0, 3.5)
plt.show()

위 결정경계 지점에서의 예측확률은 .predict_proba()에 의해 확인할 수 있습니다.

pre_prob=softmax_reg.predict_proba(X)
np.around(pre_prob[np.argmin(np.abs(deci), axis=0), :], 3)

array([[0.055, 0.939, 0.006],
           [0.   , 0.01 , 0.99 ],
           [0.011, 0.896, 0.093]])

sympy.solvers로 방정식해 구하기

sympy.solvers로 방정식해 구하기 대수 방정식을 해를 계산하기 위해 다음 함수를 사용합니다. sympy.solvers.solve(f, *symbols, **flags) f=0, 즉 동차방정식에 대해 지정한 변수의 해를 계산 f : 식 또는 함수 symbols: 식의 해를 계산하기 위한 변수, 변수가 하나인 경우는 생략가능(자동으로 인식) flags: 계산 또는 결과의 방식을 지정하기 위한 인수들 dict=True: {x:3, y:1}같이 사전형식, 기본값 = False set=True :{(x,3),(y,1)}같이 집합형식, 기본값 = False ratioal=True : 실수를 유리수로 반환, 기본값 = False positive=True: 해들 중에 양수만을 반환, 기본값 = False 예 $x^2=1$의 해를 결정합니다. solve() 함수에 적용하기 위해서는 다음과 같이 식의 한쪽이 0이 되는 형태인 동차식으로 구성되어야 합니다. $$x^2-1=0$$ import numpy as np from sympy import * x = symbols('x') solve(x**2-1, x) [-1, 1] 위 식은 계산 과정은 다음과 같습니다. $$\begin{aligned}x^2-1=0 \rightarrow (x+1)(x-1)=0 \\ x=1 \; \text{or}\; -1\end{aligned}$$ 예 $x^4=1$의 해를 결정합니다. solve() 함수의 인수 set=True를 지정하였으므로 결과는 집합(set)형으로 반환됩니다. eq=x**4-1 solve(eq, set=True) ([x], {(-1,), (-I,), (1,), (I,)}) 위의 경우 I는 복소수입니다.즉 위 결과의 과정은 다음과 같습니다. $$x^4-1=(x^2+1)(x+1)(x-1)=0 \rightarrow x=\pm \sqrt{-1}, \; \pm 1=\pm i,\; \pm1$$ 실수...

sons dataStory

이 블로그 검색

[matplotlib]quiver()함수

softmax 모델(Softmax Regression)

내용

Softmax Regression

Softmax 회귀 모델

비용함수

모델 생성

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

[Linear Algebra] 유사변환(Similarity transformation)

[sympy] Sympy객체의 표현을 위한 함수들

sympy.solvers로 방정식해 구하기