YeeKal
ml

05_bayesian_learning

YeeKal
"#ml"

Bayesian learning

conditional probability

事件B发生的条件下事件A发生的概率:

Bayesian theory

We know $P(A|B)$, but we focus more on $P(B|A)$ :

optimal classifier

If $P(y_K|x)=max{P(y_1|x),P(y_2|x)\cdots P(y_k|x)}$, then $x\in y_k$.

According to Bayesian theory:

For $\forall P(y_i|x)$, $P(x)$ is the same. So only numerators matter(Naive Bayes assumption: features are conditionally independent):

Beta distribution

对概率的概率分布,区间(0,1). 概率密度函数:

expectation: $\frac{\alpha}{\alpha+\beta}$

二项分布:$P(data|\theta)\varpropto \theta^z(1-\theta)^{N-z}$

beta 分布: $Beta(a,b)\varpropto \theta^{a-1}(1-\theta)^{b-1}$

在贝叶斯估计中: 需要在给定数据情况下求出$\theta$的值. 现在我们将Beta分布代进$P(\theta)$, 将二项分布代入$P(data|\theta)$:

得到的贝叶斯估计服从Beta(a',b')分布,即"Beta distribution is binomial conjugate prior(共轭先验)."用B函数将它标准化就得到后验概率:

  • maximum likelihood estimation(MLE): choose value that maximizes the probability of observed data
  • maximum a posterior estimation(MAP): choose value that is most probable given observed data and prior belief

Naive Bayes

naive bayes assumption: features are conditionally independent given class.

GNB

GNB: Gaussian Naive Bayes, which is designed for continuous features.

Assumptions

  • Y is boolean, goverened by a Bernoulli distribution, with parameter $\pi=P(Y=1)$
  • each $x_i$ is a continuous random variable
  • for each $x_i,\quad P(x_i|Y=y_k)$ is a Gaussian distribution of the form $N(\mu,\sigma)$
  • for each $x_i$, they are conditionally independent

With the conditional assumption:

Define: $\theta_{i1}=P(X_i=1|Y=1),\quad \theta_{i0}=P(X_i=1|Y=0)$, and then:

So: