Skewness and Kurtosis
Skewness and kurtosis are also frequently used in addition to the mean and variance as a statistic that indicates the characteristics of the probability distribution. Skewness indicates the degree of asymmetry between the left and right with respect to the center of the distribution, that is, the mean, and kurtosis is an index indicating the sharpness of the peak of the distribution. Skewness and kurtosis can be calculated with Equations 1 and 2, respectively. As shown in these equations, these statistics are the expected values for a new random variable, applying the third and fourth powers to the difference between the random variable and the mean. As a result, skewness and kurtosis are calculated using third and fourth-order moments, respectively.
- The skewness of the standard normal distribution is 0
- skewness <0: Distribution skewed to right
- skewness >0: Distribution skewed to left
- The kurtosis of the standard normal distribution is 0
- kurtosis <0: It is sharper than the standard normal distribution.
- kurtosis >0: less spiky than the standard normal distribution.
Consider a binomial distribution that repeats Bernoulli trials with probability p of success.
The binomial distribution calculates the probability using a combination as in Equation 3.
$$\begin{align}\tag{3}&f(x)=\binom{n}{s}p^s(1-p)^{n-s}\\& n: \text{Total number of trials }\\&s: \text{number of success}\\&p:\text{Probability of success per trial } \end{align}$$The probability mass function (pmf) of the binomial distribution can be calculated with bionom()
of class scipy.stats.binom. For example, try to calculate pmf under the conditions of $\displaystyle n=100, \; p=\frac{1}{10},\; s=10$.
import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy import stats from scipy.special import comb
#by combination definition round(comb(100, 10)*(1/10)**10*(1-1/10)**(100-10), 4)
0.1319
round(stats.binom.pmf(10, 100, 1/10), 4)
0.1319
The expected value, variance, skewness, and kurtosis of the probability distribution for 10 trials of an event with a probability of success $\displaystyle \frac{1}{10}$ are calculated as follows.
n=10 p=1/10 X=np.arange(n+1) pmF=np.array([]) Eind=np.array([]) for i in X: pmF=np.append(pmF,stats.binom.pmf(i, n, p)) Eind=np.append(Eind, i*pmF[i]) np.around(pmF, 3)
array([0.349, 0.387, 0.194, 0.057, 0.011, 0.001, 0. , 0. , 0. , 0. , 0. ])
E=np.sum(Eind) np.around(E, 3)
1.0
var=np.sum(X**2*pmF)-E**2 np.around(var, 3)
0.9
skw=np.sum((X-E)**3*pmF)/var**(3/2) np.around(skw, 3)
0.843
kurt=np.sum((X-E)**4*pmF)/var**(4/2)-3 np.around(kurt, 3)
0.511
The statistics of the distribution calculated above can be calculated using the stats(n, p, moments="mv") method of the scipy.stats.binom class, passing as an argument the type of statistic to calculate. 'm', 'v', 's', and 'k' stand for mean, variance, skewness, and kurtosis, respectively.
n=10 p=np.array([1/10, 1/2, 8/10]) X=np.arange(n+1) pmf=stats.binom.pmf(X.reshape(11, 1), n, p) pd.DataFrame(np.around(pmf, 3), index=X, columns=p)
0.1 | 0.5 | 0.8 | |
---|---|---|---|
0 | 0.349 | 0.001 | 0.000 |
1 | 0.387 | 0.010 | 0.000 |
2 | 0.194 | 0.044 | 0.000 |
3 | 0.057 | 0.117 | 0.001 |
4 | 0.011 | 0.205 | 0.006 |
5 | 0.001 | 0.246 | 0.026 |
6 | 0.000 | 0.205 | 0.088 |
7 | 0.000 | 0.117 | 0.201 |
8 | 0.000 | 0.044 | 0.302 |
9 | 0.000 | 0.010 | 0.268 |
10 | 0.000 | 0.001 | 0.107 |
re=stats.binom.stats(n, p, moments="mvsk") re
(array([1., 5., 8.]), array([0.9, 2.5, 1.6]), array([ 0.84327404, 0. , -0.47434165]), array([ 0.51111111, -0.2 , 0.025 ]))
Figure 1 is a visualization of the corresponding probability (pmf) for each value of the variable X.
plt.plot(pmf[:,0], label='p=0.1') plt.plot(pmf[:,1], label='p=0.5') plt.plot(pmf[:,2], label='p=0.8') plt.xlabel('x', size=13, weight="bold") plt.ylabel('PMF', size=13, weight="bold") plt.legend(loc="best") plt.show()
댓글
댓글 쓰기