As a theorem, it is an easy consequence of the definitions of joint probabilities (denoted by , conditional probabilities (denoted by ) and marginal or unconditional probabilities (denoted by ). In words, one could say that the fraction of trials

The theorem acquires its application to statistical inference when we
think of as a hypothesis which is being tested by measuring some data
. In real life, with noisy and incomplete data, we never have the
luxury of measuring directly, but only something depending on it in a
nonunique fashion. If we understand this dependence, i.e understand our
experiment, we know
. If only, (and this is a big IF!), someone gave us , then
we
would be able to compute the dependence of on from Bayes
theorem.

Going from to may not seem to be a big step for a man, but it is a giant step for mankind. It now tells us the probability of different hypotheses being true based on the given data . Remember, this is the real world. More than one hypothesis is consistent with a given set of data, so the best we can do is narrow down the possibilities. (If ``hypothesis'' seems too abstract, think of it as a set of numbers which occur as parameters in a given model of the real world)