Understanding Bayes' Theorem From a Machine Learning Perspective

1 A Humorous Explanation of Bayes’ Theorem

The Bayesian school believes nothing is random; if it appears so, it’s due to insufficient information (Shannon’s information theory); The Bayesian school in statistics led to the Bayesian approach in machine learning.

Bayes’ theorem gives us the ability to infer through various probabilities before an event occurs after the event has happened.

An unintended use of Bayes: a joke—water is toxic because everyone who has cancer has drunk water. An example of being unintentionally deceived by Bayes: a diagnostic method with a high detection rate (99.9% accuracy) but a very high misdiagnosis rate (>50%) because the prevalence in the general population is (<1%). Probability and statistics are really like a little girl dressed up by anyone.

$$P(c|x) = \frac{P(c)P(x|c)}{P(x)}$$

2 Understanding Bayes’ Theorem from a Machine Learning Perspective

The same formula as above, but in machine learning, this defines a Naive Bayes classifier, read as P c given x, the left side is the posterior probability, $P(c)$ is the prior probability, $P(x|c)$ is the likelihood, which is the part the model focuses on learning. $P(x)$ is the same for all input samples and is used for normalization (expanded using the law of total probability during calculation); the estimation of $P(c)P(c|x)$ can be done using Maximum Likelihood Estimation. (Machine Learning, p.148) From a general perspective (which may not be entirely accurate): $P(c)$ is the original probability of an event, and after some events have occurred (or we know it has occurred, which is the divergence point between the Bayesian and frequentist schools), $P(c|x)$ is the corrected probability, with the correction factor being $\frac{P(x|c)}{P(x)}$.

Too deep, at a glance it’s just a formula, but on a deeper level, it’s a worldview and methodology, the more you look, the more confusing it gets

3 Concepts Derived from Bayes’ Theorem

3.1 Prior Probability

The probability of an event occurring based on past data and experience before the event has occurred, i.e., prior probability. Or the probability that can be obtained before an experiment or sampling based on past experience and analysis. Prior probability refers to the probability obtained based on past experience and analysis, such as the law of total probability, which often appears as the “cause” in “cause-to-effect” problems.

3.2 Posterior Probability and Prior Probability

  1. Posterior Probability The probability of the factors causing an event after it has occurred, i.e., posterior probability. It refers to calculating the probability that a certain factor caused an event after the event has occurred. Posterior probability refers to the most likely event that occurred based on the “result” information obtained, as in Bayes’ theorem, it is the “cause” in “effect-to-cause” problems.
  2. Relationship with Prior Probability The calculation of posterior probability is based on prior probability as a prerequisite. If only the result of an event is known, and the prior probability is unknown (no past data statistics), the posterior probability cannot be calculated. The calculation of posterior probability requires the application of Bayes’ theorem.
  3. Relationship between the Law of Total Probability, Bayes’ Theorem, and Prior and Posterior Probabilities The law of total probability summarizes the union of probabilities of an event caused by several factors. Cause-to-effect. Bayes’ theorem calculates the probability of each factor causing the result after the event has occurred. Effect-to-cause. Same as posterior probability. The law of total probability uses causes to infer results, while Bayes uses results to infer causes.

4 Reference Articles

Main Reference

“Machine Learning” by Zhou Zhihua

Buy me a coffee~
Tim AlipayAlipay
Tim PayPalPayPal
Tim WeChat PayWeChat Pay
0%