# Understanding Bayes' Theorem from a Machine Learning Perspective

## A Fascinating Interpretation of Bayes’ Theorem

The Bayesian school believes that nothing is random. If it appears so, then it’s due to a lack of information (Shannon Information Theory); The Bayesian school in statistics has led to the Bayesian theory in machine learning.

Bayes’ theorem provides us with the ability to infer based on the probabilities before an event, after the event has occurred.

An accidental use of Bayes’ example: A joke—water is deadly poison because everyone who got cancer drank water. An example of being inadvertently misled by Bayes: Diagnostic methods with very high detection rates (99.9% accuracy) have an extremely high misdiagnosis rate (>50%). This is because the prevalence of the disease in the general population is less than 1%.

~~Probability and statistics are indeed like a young girl who is ready to be dressed up by anyone.~~

$$P(c|x) = \frac{P(c)P(x|c)}{P(x)}$$

## Understanding Bayes’ Theorem from a Machine Learning Perspective

The same formula as above, but in machine learning, it defines a naive Bayes classifier, read as `P c given x`

, where the left side is the posterior probability, $P(c)$ is the prior probability, $P(x|c)$ is the likelihood, which is the main focus of the model’s learning. $P(x)$ is the same for all input samples and is used for normalization (it is expanded using the total probability formula during computation); the estimation of $P(c)P(c|x)$ can use the method of Maximum Likelihood Estimation (see Watermelon Book p148).
From a general perspective (which might not be entirely accurate): $P(c)$ is the original probability of an event. After something happens (or we know it has happened, which goes to the divide points between Bayesian and frequentist schools), $P(c|x)$ is the adjusted probability, and the adjustment factor is $\frac{P(x|c)}{P(x)}$.

~~Too deep, superficially it’s just a formula, but deeper down it’s actually about worldview methodology, the more you look, the more confused you get~~

## Some Concepts Derived from Bayes’ Theorem

### Prior Probability

Before an event occurs, based solely on past data analysis, the probability of the event happening is called the prior probability. Or it can be seen as based on past experience and analysis, the probability that can be obtained before experiments or sampling. The prior probability is based on past experience and analysis, like the total probability formula, it often appears as the “cause” in “from cause to effect” problems.

### Posterior Probability and Prior Probability

- Posterior Probability After an event has occurred, with results available, it refers to the probability that this event was caused by certain factors, from effect to cause, i.e., the posterior probability. It calculates the likelihood that the event was caused by a specific factor after the event has already happened. The posterior probability is calculated based on the outcome information, such as in Bayes’ theorem, where it represents the “cause” in “seeking the cause from the effect” problems.
- Relationship with Prior Probability The calculation of the posterior probability is based on the prior probability. If only the outcome of the event is known without prior probability (no past data statistics), it is impossible to calculate the posterior probability. The calculation of the posterior probability requires the use of Bayes’ theorem.
- Relationship between Total Probability Formula, Bayes’ Theorem and Prior/Posterior Probabilities The total probability formula summarizes the union of the probabilities of several causes leading to an event happening. It goes from cause to effect. Bayes’ theorem is used once an event has occurred to calculate the probability of various causes that led to this outcome, going from effect to cause. It goes together with the posterior probability. Total probability is used to infer results from causes, while Bayes is used to infer causes from results.

## Reference Articles

“Machine Learning” by Zhou Zhihua