Probability Inequalities & Moment Generating Functions

Probability Inequalities
1. Markov's inequality
2. Chebyshev's inequality
Moment generating function
1. Combination of random variables

Probability Inequalities & Moment Generating Functions

It is necessary to know the interval of probabilities that contains the value(s) of interest in a distribution that can be estimated as a statistic. In addition, it is necessary to establish a confidence interval indicating the degree of confidence in the results of statistical analysis. Markov and Chebyshev inequalities are mathematical expressions underlying the rationale for establishing probability intervals or confidence intervals.

Probability Inequalities

Markov's inequality

If X is a random variable and g(x) is a nonnegative real-valued function, then Equation 1 holds for any positive real c.

$$\begin{equation} \tag{1} p[g(x) \ge c] \le \frac{E[g(x)]}{c} \end{equation}$$

If the event of this variable is $A=\{x|g(x) \ge c\}$, the above expression is proved as follows.

$$ \begin{aligned} E[g(x)]&=\int^\infty_{-\infty} g(x)f(X)\, dx\\&=\int_{A} g(x)f(X)\, dx+\int^c_{A} g(x)f(X)\, dx\\& \ge \int_{A} g(x)f(X)\, dx\\&\ge \int_{A} cf(X)\, dx\\&=cP[x \in A]\\&=cP[g(x) \ge c] \end{aligned}$$

Example 1)
The random variable X follows a binomial distribution with mean np and variance np(1-p). Apply the Markov inequality to determine the upper bound of the probability that satisfies the expression below.

$$\begin{aligned}\text{if} \quad p=\frac{1}{2}\; & \;\text{and} \;\alpha =\frac{3}{4},\\P(X \ge \alpha n) &\le \frac{E(X)}{\alpha n} \\&=\frac{pn}{\alpha n}\\&=\frac{p}{\alpha}\\&=\frac{2}{3}\end{aligned}$$

Chebyshev's inequality

From the random variable X, define another non-negative random variable $Y=(X-E(X))^2$. Markov inequality can be applied to this variable.

$$\begin{aligned}&P(Y \ge b^2) \le \frac{E(Y)}{b^2}\\ &\begin{aligned} E(Y)&=E(X-E(X))^2\\&=Var(x)\end{aligned}\\ &\begin{aligned} P(Y \ge b^2)&=P((X-E(X))^2 \ge b^2)\\&=P(|x-E(X)| \ge b)\end{aligned} \\ &\therefore \; P(|x-E(X)| \ge b) \le \frac{Var(X)}{b^2}\\ &b: \text{positive real number } \end{aligned}$$

The above result is called Chebyshev's inequality and is generalized as Equation 2.

$$\begin{equation} \tag{2} P(|x-E(X)| \ge b) \le \frac{Var(X)}{b^2} \end{equation}$$

This inequality means that the difference between a random variable X and its mean (E(X)) is bounded by its variance (Var(X)). Even if the probability distribution function of the random variable X is unknown, the deviation from the mean can be estimated intuitively by Equation 2.

Example 2)
The random variable X follows a binomial distribution with mean np and variance np(1-p). Apply Chebyshev's inequality to determine the upper bound of the probability that satisfies $$\begin{aligned} &P(X \ge \alpha n)\\ & p=\frac{1}{2}, \quad \alpha \frac{3}{4} \end{aligned}$$

$$\begin{aligned} P(X \ge \alpha n) &=P(X-np \ge \alpha n - np)\\&\le P(|X-np|\ge \alpha n - np)\\ &= \frac{Var(X)}{(n \alpha - np)^2} \\ &= \frac{p(1-p)}{n(\alpha-p)^2}\\& \le \frac{4}{n} \end{aligned}$$

Example 2)
If the PDF of the random variable X is $f(x) = e^{-x}$ and $x>0$, then E(X) and Var(X)?
Also, apply the Chebyshev inequality to the probability that the difference between the variable value and the mean is greater than twice the standard deviation.

The mean and variance are:

import numpy as np
import pandas as pd
from sympy import *

x=symbols("x")
f=exp(-x)
E=integrate(x*f, (x, 0, oo))
E

var=integrate(x**2*f, (x, 0, oo))-E**2
var

Determine the probabilities of the following events.

$$\begin{aligned} P(|x-1| \ge 2) &= P(X \ge 3) \cup P(X \le -1)\\&=P(X \ge 3)\\&=1 -P(X < 3)\end{aligned}$$

N(1-integrate(f, (x, 0, 3)), 3)

0.0498

Applying the above case to the Chebyshev inequality, the upper bound is:

$$P(|x-1| \ge 2) = \frac{1}{4}$$

Therefore, the probability of the specified event is less than the upper bound of the Chebyshev inequality.

Moment generating function

Moment is a characteristic quantity that is the basis for calculating other values, and the expected value of the function that can generate the moment is called the moment generating function. Statistics that characterize the probability distribution can be calculated using the moment defined as follows.

For a random variable X, the nth moment centered on the origin is defined as $E[X^n]$, and the nth order of the central moment centered on the mean value is defined as E[(X-E(X))ⁿ].

The first-order moment is the expected value E[X] and the second-order central moment is the variance of X as E[(x-\mu)²].

If there are expected values for all t belonging to (-h, h) where h > 0 in the random variable X, the moment generating function (MGF) is defined as Equation 3.

$$\begin{align}\tag{3} M_X(t)&=E(e^{tX})\\ &=\sum_{x \in \mathbb{R}} e^{tX}f(x) \quad \text{discrete variable }\\ &=\int^\infty_{-\infty}e^{tX}f(x) \quad \text{continuous variable } \end{align}$$

Example 4)
If the PMF of the discrete random variable X is as follows, the moment generating function (MGF) is calculated.

$$P_X(t)=\begin{cases} \frac{1}{3}& t=1\\\frac{2}{3}& t=2\end{cases}$$ $$\begin{aligned} M_X(t)&=E(e^{-tX})\\&=\sum_{x \in X }e^{-tx}f(x)\\&=\frac{1}{3}e^t+\frac{2}{3}e^{2t} \end{aligned}$$

MGF can represent all moments of a random variable. Therefore, the distribution can be determined by this function. For example, if two random variables have the same MGF, it means they must have the same distribution.

The function e^tX for MGF defined above can be expressed as Taylor Seires expanding as Equation 4.

Taylor Seires
$$\begin{align}\tag{4}e^x&=1+x+\frac{x^2}{2}+\frac{x^3}{3}+ \cdots\\ &=\sum^\infty_{k=0} \frac{x^k}{k!} \end{align}$$

Taylor series can be expressed using the series() method of the sympy module or the fps() function. The O() in the following code (the letter O) is the Big O notation used in computer science as a symbol for $x=\infty$. Use removeO() to delete this part.

exp(x).series(x)

$$\color{blue}{\quad 1 + x + \frac{x^{2}}{2} + \frac{x^{3}}{6} + \frac{x^{4}}{24} + \frac{x^{5}}{120} + O\left(x^{6}\right)}$$

x=symbols("x")
fps(exp(x))

$$\color{blue}{\quad \left(\sum_{k=1}^{\infty} \begin{cases} \frac{x^{k}}{k!} & \text{for}\: k\bmod{1} = 0 \\0 & \text{otherwise} \end{cases}\right) + 1}$$

Apply the above Taylor equation to the expansion of the moment generating function (MGF).

$$\begin{aligned} M_X(t)&=E(e^{tX})\\&=E(1+tX+\frac{t^2X^2}{2!}+\frac{t^3X^3}{3!}+\cdots)\\ &=1+tE(X)+\frac{t^2}{2!}E(X^2)+\frac{t^3}{3!}E(X^3)+\cdots \end{aligned}$$

Perform the first-order derivative of the expanded MGF with respect to t and substitute zero.

$$ \begin{aligned} \frac{d(M_X(t))}{dt}&=E(X)+tE(X^2)+\frac{t^2}{2!}E(X^3)+\cdots\\\frac{d(M_X(0))}{dt}&=E(X) \end{aligned}$$

The result of first differentiating MGF and substituting 0 is the first moment of the random variable, that is, the expected value. Try the second derivative in the same way.

$$ \begin{aligned} \frac{d^2(M_X(t))}{dt^2}&=E(X^2)+tE(X^3)+\cdots\\\frac{d^2(M_X(0))}{dt^2}&=E(X^2) \end{aligned}$$

The result is a second moment. If this process is generalized as in Equation 5, the moment of the same degree of the derivative of the moment generating function is generated as shown in the following equation. Therefore, various moments such as expected value and variance can be calculated by MGF.

$$\begin{equation}\tag{5}\frac{d^n(M_X(t))}{dt^n}=E(X^n) \end{equation}$$

Example 5)
A distribution that has a uniform probability mass function for all random variables is called a uniform distribution. The following is a uniform distribution with the following probability mass function (pmf) in the interval of $a ≤ x ≤ b$ for the random variable X.

$$f(x) = \frac{1}{b-a}$$

Determine the moment generating function and calculate the expected value of the first moment from it.
For this calculation, the series() method of the sympy module was used to develop the moment generating function as a Taylor series, and integrate(), diff() was used for integration and differentiation, respectively.

a, b, x, t=symbols("a, b, x, t", real=True)
f=1/(b-a)
M=integrate(exp(t*x)*f, (x, a, b))
M

$$\color{blue}{\begin{cases} \frac{e^{a t}}{a t - b t} - \frac{e^{b t}}{a t - b t} & \text{for}\: a t - b t \neq 0 \\\frac{a}{a - b} - \frac{b}{a - b} & \text{otherwise} \end{cases}}$$

M1=M.args[0][0]
M1

$$\color{blue}{\displaystyle \frac{e^{a t}}{a t - b t} - \frac{e^{b t}}{a t - b t}}$$

M2=M1.series(t, 0, 4).removeO()
M2

$$\color{blue}{\displaystyle \begin{aligned}&\frac{a}{a - b} - \frac{b}{a - b} \\&+ t^{3} \left(\frac{a^{4}}{24 \left(a - b\right)} - \frac{b^{4}}{24 \left(a - b\right)}\right) \\&+ t^{2} \left(\frac{a^{3}}{6 \left(a - b\right)} - \frac{b^{3}}{6 \left(a - b\right)}\right) \\&+ t \left(\frac{a^{2}}{2 \left(a - b\right)} - \frac{b^{2}}{2 \left(a - b\right)}\right)\end{aligned}}$$

E1=(M2.diff(t)).subs(t, 0)
simplify(E1)

$$\color{blue}{\displaystyle \frac{a}{2} + \frac{b}{2}}$$

Combination of random variables

In actual data analysis, the relationship between two or more variables is often the subject of analysis. For example, in determining the relationship between cancer and tobacco or the relationship between stock prices and interest, there are two or more variables to be analyzed. In this multivariate situation, the process of calculating probability and various statistics is similar to the process for univariate introduced in Section 3.3, Probability and Statistics.

Example 6)
Of the 12 students in class A, there are 3 soccer players and 4 baseball players. If three people are selected to play a certain sport with another class, what is the probability that they are all students from athletes?

If the variable of a soccer player is X, the variable of a baseball player is Y, and the remainder is Z, the probability of this distribution is calculated as follows.

$$\begin{aligned} &p(X=x, Y=y, Z=z)=\frac{\binom{3}{x} \binom{4}{y} \binom{5}{z}}{\binom{12}{3}}\\ &x+y+z=12 \end{aligned}$$

from scipy import special 
total=special.comb(12, 3)
total

220.0

p=pd.DataFrame([[]])
for i in range(4):
    for j in range(5):
        for k in range(5):
            if i+j+k==3:
                x=pd.DataFrame([[i, j, k, special.comb(3,i)*special.comb(4, j)*special.comb(5, k)/total]])
        p=pd.concat([p, x])
p=np.around(p.iloc[1:,:], 3)
p.columns=['x','y','z','P']
p

	x	y	z	P
0	0.0	0.0	3.0	0.045
0	0.0	1.0	2.0	0.182
0	0.0	2.0	1.0	0.136
⁞	⁞	⁞	⁞	⁞
0	3.0	0.0	0.0	0.005
0	3.0	0.0	0.0	0.005

Since all players must come from players, the cross table considering only the variables x and y is helpful in understanding the probabilities. This cross table (pivottable) can be created using the pd object.pivot_table() method.

p.pivot_table('P','x' , 'y', aggfunc="sum", margins=True )

y	0.0	1.0	2.0	3.0	All
x					
0.0	0.045	0.182	0.136	0.036	0.399
1.0	0.136	0.273	0.246	NaN	0.655
2.0	0.068	0.220	NaN	NaN	0.288
3.0	0.025	NaN	NaN	NaN	0.025
All	0.274	0.675	0.382	0.036	1.367

As in the above process, each probability of a combined random variable becomes the probability of the intersection of each variable. In the cross table above, when both x and y are 0, that is, in the case of P(0, 0, 3), if all random variables are independent, it is calculated as follows.

px0=special.comb(3, 0)/special.factorial(3)
round(px0, 3)

0.167

py0=special.comb(4, 0)/special.factorial(4)
round(py0, 3)

0.042

pz3=special.comb(5, 3)/special.factorial(5)
round(pz3, 3)

0.083

p003=px0*py0*pz3
round(p003, 3)

0.001

This result differs from the result of 0.045 shown in the crosstabulation. Therefore, the random variables X, Y, and Z are not independent and are calculated by applying Bayes theorem in this case.

$$\begin{aligned}&f(x, y, z)=\frac{\binom{X}{x}\binom{Y}{y}\binom{Z}{z}}{\binom{X+Y+Z}{x+y+z}}\\ &X, Y, Z: \text{Total number of each random variable X, Y, Z }\\&x, y, z: \text{Number of x, y, z selected } \end{aligned}$$

The above probability mass function is the same as the hypergeom distribution which will be mentioned later. Applied to this function, P(X=0, Y=0, Z=3) is calculated as

p003=special.comb(3, 0)*special.comb(4, 0)*special.comb(5, 3)/special.comb(12, 3)
round(p003, 3)

0.045

For x=0 in this example, this marginal probability is calculated as

$$f_{X=0}(Y=y, Z=z)=\binom{X}{0} \sum^Y_{j=0} \sum^Z_{k=0}\binom{Y}{j}\binom{Z}{k} \frac{1}{\binom{X+Y+Z}{x+y+z}}$$

The above calculations are made on discrete random variables, and integration is used for continuous variables.

Example 7)
If the probability density function of two continuous random variables X and Y is

$$f(x, y) = 2⋅\exp(-x)⋅\exp(-2y), \quad 0 < x,\; y <\infty$$

Let's calculate:

p{X < 1, Y > 1}

$$f(X>1, Y<1)=\int^1_0\int^\infty_1 f(x,y)\,dxdy$$

x,y=symbols('x y')
f=2*exp(-x)*exp(-2*y)
p=integrate(f,(x, 1, oo), (y, 0, 1))
p

$\quad \color{blue}{\displaystyle - \frac{1}{e^{3}} + e^{-1}}$

N(p, 3)

0.318

P{X < a}, a > 0

a=symbols("a")
p=integrate(f,(x, 0, oo), (y, 0, a)).evalf(3)
p

$\quad \color{blue}{\displaystyle 1.0 - e^{- 2 a}}$

Example 8)
If two random variables X and Y are independent and each probability density function is

$$\begin{aligned} f(x)&=\exp(-x), \quad x>0\\ f(y)&=\exp(-y), \quad y>0 \end{aligned}$$

Combined probability density function of random variable X/Y?

Independent means f(x, y)=f(x)f(y). Therefore, the joint probability density function is

$$f(x, y) = \exp(-x)\exp(-y), \quad x,\; y >0$$

If the random variable $\frac{X}{Y}$ is a, then the probability density function for a is calculated. This function can be calculated using the combined probability density of two variables X and Y.

$$\begin{aligned}\frac{X}{Y}=a \rightarrow X=aY,\; a>0\\ 0 < X < aY, \; 0 < Y \end{aligned}$$

The probability density function is the derivative of the cumulative probability function.

In order to determine the probability density function expressed only by the new variable a, the cumulative probability function of f(x, y) can be calculated and the result can be expressed by differentiating it again. The cumulative probability function is:

$$F_{X/Y}(a)=\int^\infty_0 \int^{ay}_0 \exp(-x)\exp(-y)\, dxdy$$

a=symbols("a", positive=True)
x=symbols("x", positive=True)
y=symbols("y", positive=True)
f=exp(-x)*exp(-y)
F=integrate(f, (x, 0, a*y),(y, 0, oo))
F

$\color{blue}{\displaystyle \frac{1}{\left(a + 1\right)^{2}}}$

Example 9)
Families with children in the village are as follows:

# of children in the family	No	1	2	3
Rate	15%	20%	35%	30%

Each child has an equal chance of being a boy or a girl. Therefore, the variable for this is independent. If a family is selected in this village, what is the probability of all cases for the number of boys (B) and the number of girls (G) in that family?

$$\text{PMF} = \text{P(selected family)} \;\times\; \text{P(boy or girl)}$$

Here, the probabilities for boys and girls can be recognized as a probability experiment with a probability of success $\displaystyle \frac{1}{2}$. Expressing each pair as (boys, girls), the total sample space (S) is:

$$S=\{(0, 0), (0, 1), (1, 0), (0, 2), (1, 1), (2, 0), (0, 3), (1, 2), (2, 1), (3, 0)\}$$

Above (1,1) means (B=1,G=1 | Child=2). Therefore, it is calculated as:

$$\begin{aligned}&P(B=1, G=1|C=2)=\frac{P(B=1, G=1)}{P(C=2)}\\ &\begin{aligned}P(B=1, G=1)&=P(B=1, G=1|C=2) \cdot P(C=2)\\ &=\binom{2}{1}\left(\frac{1}{2}\right)^{1}\left(\frac{1}{2}\right)^{2-1}P(C=2)\end{aligned}\\ & \begin{cases}\text{C}:&\text{Child}\\\text{G} :& \text{Girl}\\\text{B} :& \text{Boy}\end{cases} \end{aligned}$$

In the above case, both (B,G) and (G, B) cases are possible, so the permutation of each case must be considered as in the above calculation. The probability mass function that generalizes the above case to this problem is as follows.

$$\begin{aligned}&f(g)=\binom{C}{g}\left(\frac{1}{2}\right)^{C-g}\left(\frac{1}{2}\right)^{g}p(C)\\ & C: \text{Child} = 0, 1, 2, 3\\ &G: \text{Girl}= 0, 1, 2, 3 \end{aligned} $$

Of course, replacing the number of girls (g) with the number of boys (b) in the above case results in the same result.

pc={0:0.15, 1:0.2, 2:0.35, 3:0.3}
re=pd.DataFrame()
for i in pc.keys():
    for j in range(i+1):
        re1=pd.DataFrame([[i, j, i-j, special.comb(i, j)*(1/2)**j*(1/2)**(i-j)*pc[i]]])
        re=pd.concat([re, re1])
re.columns=['C', 'G', 'B', 'P']
re

	C	G	B	P
0	0	0	0	0.1500
0	1	0	1	0.1000
⁞	⁞	⁞	⁞	⁞
0	3	2	1	0.1125
0	3	3	0	0.0375

rePivot=re.pivot_table('P','G','B', aggfunc="sum", margins=True)
rePivot

B	0	1	2	3	All
G					
0	0.1500	0.1000	0.0875	0.0375	0.3750
1	0.1000	0.1750	0.1125	NaN	0.3875
2	0.0875	0.1125	NaN	NaN	0.2000
3	0.0375	NaN	NaN	NaN	0.0375
All	0.3750	0.3875	0.2000	0.0375	1.0000

$$\begin{aligned}P(G=1)&=P(G=1, B=0)\\&+P(G=1, B=1)\\&+P(G=1, B=2)\\&+P(G=1, B=3)\end{aligned}$$

rePivot.iloc[1, :4].sum()

0.3875

$$P(B=1 | G=1)=\frac{P(B=1, G=1)}{P(G=1)}$$

rePivot.iloc[1,1]/rePivot.iloc[1, :4].sum()

0.4516129032258064

Example 10)
The combined probability density function of random variables X and Y is as follows.

$$f(x, y)=\frac{12}{5}x(2-x-y), \quad 0Calculate the probability density function of X under the condition of Y = y?

$$P(X=x|Y=y)=\frac{P(x,y)}{P(y)} $$

P(y) is the integral with respect to x and fixed y in the above probability density function. In other words,

$$\int^1_0f(x, y)\, dx=\int^1_0 \frac{12}{5}x(2-x-y)\, dx$$

x, y=symbols('x y')
f=12/5*x*(2-x-y)
py=f.integrate((x, 0, 1))
py

$\quad \color{blue}{\displaystyle 1.6 -1.2y}$

px_y=f/py
px_y

$\quad \color{blue}{\displaystyle \frac{2.4 x \left(- x - y + 2\right)}{1.6 - 1.2 y}}$

sympy.solvers로 방정식해 구하기

sympy.solvers로 방정식해 구하기 대수 방정식을 해를 계산하기 위해 다음 함수를 사용합니다. sympy.solvers.solve(f, *symbols, **flags) f=0, 즉 동차방정식에 대해 지정한 변수의 해를 계산 f : 식 또는 함수 symbols: 식의 해를 계산하기 위한 변수, 변수가 하나인 경우는 생략가능(자동으로 인식) flags: 계산 또는 결과의 방식을 지정하기 위한 인수들 dict=True: {x:3, y:1}같이 사전형식, 기본값 = False set=True :{(x,3),(y,1)}같이 집합형식, 기본값 = False ratioal=True : 실수를 유리수로 반환, 기본값 = False positive=True: 해들 중에 양수만을 반환, 기본값 = False 예 $x^2=1$의 해를 결정합니다. solve() 함수에 적용하기 위해서는 다음과 같이 식의 한쪽이 0이 되는 형태인 동차식으로 구성되어야 합니다. $$x^2-1=0$$ import numpy as np from sympy import * x = symbols('x') solve(x**2-1, x) [-1, 1] 위 식은 계산 과정은 다음과 같습니다. $$\begin{aligned}x^2-1=0 \rightarrow (x+1)(x-1)=0 \\ x=1 \; \text{or}\; -1\end{aligned}$$ 예 $x^4=1$의 해를 결정합니다. solve() 함수의 인수 set=True를 지정하였으므로 결과는 집합(set)형으로 반환됩니다. eq=x**4-1 solve(eq, set=True) ([x], {(-1,), (-I,), (1,), (I,)}) 위의 경우 I는 복소수입니다.즉 위 결과의 과정은 다음과 같습니다. $$x^4-1=(x^2+1)(x+1)(x-1)=0 \rightarrow x=\pm \sqrt{-1}, \; \pm 1=\pm i,\; \pm1$$ 실수...

sons dataStory

이 블로그 검색

[matplotlib]quiver()함수

Probability Inequalities & Moment Generating Functions

Contents

Probability Inequalities & Moment Generating Functions

Probability Inequalities

Markov's inequality

Chebyshev's inequality

Moment generating function

Combination of random variables

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

[Linear Algebra] 유사변환(Similarity transformation)

[sympy] Sympy객체의 표현을 위한 함수들

sympy.solvers로 방정식해 구하기