[data analysis] 라벨 인코딩( Label encoding)

라벨인코딩(Label ecoding)

자료의 라벨(label)들을 컴퓨터에 입력할 경우 컴퓨터의 경우는 라벨 자체의 값보다는 라벨 그룹내에서 각 라벨의 인덱스를 인식합니다. 이렇게 컴퓨터가 인식할 수 있도록 변환하는 과정을 인코딩(encoding), 그 반대를 디코딩(decoding)이라 합니다. 예를 들어 객체 x([-1, 4, 7])의 각 값의 인덱스 0, 1, 2가 됩니다. 그러므로 x의 인코딩 결과는 식 1과 같으며 라벨인코딩이라 합니다.

디코딩(값)		인코딩	(식 A3.2.3)
-1	⇒	0
4	⇒	1
7	⇒	2

클래스표시행렬과 같이 라벨 인코딩의 각 라벨은 데이터의 고유값들을 올림차순으로 정렬한 상태의 인덱스를 사용합니다. 이 고유값들로부터 생성되는 라벨 생성과 데이터를 그 라벨로 전환하는 과정은 LabelEncoder() 클래스를 사용하여 실행할 수 있습니다. 이 클래스의 대상은 1차원 구조의 벡터 데이터 입니다.

sklearn.preprocessing.LabelEncoder()

각 변수의 인덱스를 사용하여 명목변수를 수치형으로 전환하는 클래스
x=sklearn.preprocessing.LabelEncoder()
- x.fit(객체)
- x.class : 오름차순으로 정렬한 클래스의 이름을 나타냄
- x.transform(객체): 클래스의 인덱스를 반환, 인코딩
- x.inverse_transform(변환된 객체): 원시데이터로 환원, 디코딩

np.random.seed(2)
x=np.random.randint(-10, 10, 10)
print(x)

[-2  5  3 -2  1  8  1 -2 -3 -8]

enc=sklpre.LabelEncoder().fit(x)
print(enc.classes_)

[-8 -3 -2  1  3  5  8]

x1=enc.transform(x)
print(x1)

[2 5 4 2 3 6 3 2 1 0]

print(enc.inverse_transform(x1))

[-2  5  3 -2  1  8  1 -2 -3 -8]

이 클래스는 데이터의 인덱스에 대응하는 것으로 문자열로 이루어진 데이터의 변환에 사용할 수 있습니다.

label = ['red','black','red','green','black','yellow','white']
enc2=preprocessing.LabelEncoder().fit(label)
print(enc2.classes_)

['black' 'green' 'red' 'white' 'yellow']

print(enc2.transform(label))

[2 0 2 1 0 4 3]

print(enc2.transform(['red','white','green']))

[2 3 1]

sons dataStory

이 블로그 검색

[matplotlib]quiver()함수

[data analysis] 라벨 인코딩( Label encoding)

라벨인코딩(Label ecoding)

sklearn.preprocessing.LabelEncoder()

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

[Linear Algebra] 유사변환(Similarity transformation)

[sympy] Sympy객체의 표현을 위한 함수들

sympy.solvers로 방정식해 구하기