Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Fisher information


 

In statistics, the Fisher information I(θ), thought of as the amount of information that an observable random variable carries about an unobservable parameter θ upon which the probability distribution of X depends, is the variance of the score. Because the expectation of the score is zero, this may be written as

where f is the probability density function of random variable X. The Fisher information is thus the expectation of the square of the score. A random variable carrying high Fisher information implies that the absolute value of the score is frequently high (remember that the expectation of the score is zero).

This concept is named in honor of the geneticist and statistician Ronald Fisher.

Note that the information as defined above is not a function of a particular observation, as the random variable X has been averaged out. The concept of information is useful when comparing two methods of observation of some random process.

Information as defined above may be written as

and is thus the expection of log of the second derivative of X with respect to θ. Information may thus be seen to be a measure of the "sharpness" of the support curve near the maximum likelihood estimate of θ. A "blunt" support curve (one with a shallow maximum) would have low expected second derivative, and thus low information; while a sharp one would have a high expected second derivative and thus high information.

Information is additive, in the sense that the information gathered by two independent experiments is the sum of the information of each of them:

This is because the variance of the sum of two independent random variables is the sum of their variances. It follows that the information in a random sample of size n is n times that in a sample of size one (if observations are independent).

The information provided by a sufficient statistic is same as that of the sample X. This may be seen by using Fisher's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then

for some functions g and h (see sufficient statistic for a more detailed explanation). The equality of information follows from the fact that

(which is the case because h(X) is independent of θ) and the definition for information given above. More generally, if T=t(X) is a statistic, then

with equality if and only if T is a sufficient statistic.

The Cramér-Rao inequality states that the reciprocal of the Fisher information is a lower bound on the variance of any unbiased estimator of θ.

1 Example

The information contained in n independent Bernoulli trialIn the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called "success" and "failure. In practice it refers to a single event which can have one of two possible os, each with probability of success θ may be calculated as follows. In the following, A represents the number of successes, B the number of failures, and n = A + B is the total number of trials.

The first line is just the definition of information; the second uses the fact that the information contained in a sufficient statistic is the same as that of the sample itself; the third line just expands the logIn mathematics, the logarithm functions are the inverses of the exponential functions. Logarithms are numbers that are substituted in computation for other numbers, to which they bear such a relation that the operations to be performed on the latter are r term (and drops a constant), the fourth and fifth just differentiation wrt θ, the sixth replaces A and B with their expectations, and the seventh is algebraic manipulation.

The overall result, viz

may be seen to be in accord with what one would expect, since it is the reciprocal of the variance of the sum of the n Bernoulli random variables..

In case the parameter θ is vector-valued, the information is a positive-definite matrix, which defines a metric on the parameter space; consequently differential geometry is applied to this topic. See Fisher information metricIn mathematics, in information geometry, the Fisher information metric is a metric tensor for a statistical differential manifold. It can be used to calculate the informational difference between measurements. It takes the form: : Substituting i − l.



Read more »

Non User