Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Estimation of covariance matrices


 

In multivariate statistics, the importance of the Wishart distribution stems in part from the fact that it is the probability distribution of the maximum likelihood estimator of the covariance matrix of a multivariate normal distribution. Although no one is surprised that the estimator of the population covariance matrix is simply the sample covariance matrix, the mathematical derivation is perhaps not widely known and is surprisingly subtle and elegant.

1 The multivariate normal distribution

A random vector XRp×1 (a p×1 "column vector") has a multivariate normal distribution with a nonsingular covariance matrix V precisely if VRp × p is a positive definite matrix and the probability density function of X is

where μ ∈ Rp×1 is the expected value. The matrix V is the higher-dimensional analog of what in one dimension would be the variance.

2 Maximum-likelihood estimation

Suppose now that X1, ..., Xn are independent and identically distributed with the distribution above. Based on the observed values x1, ..., xn of this sample, we wish to estimate V (we adhere to the convention of writing random variables as capital letters and data as lower-case letters).

2.1 First steps

It is fairly readily shown that the maximum-likelihood estimate of the expected value μ is the "sample mean"

See the section on estimation in the article on the normal distribution for details; the process here is similar.

Since the estimate of μ does not depend on V, we can just substitute it for μ in the likelihood function

and then seek the value of V that maximizes this.

We have

2.2 The trace of a 1 × 1 matrix

Now we come to the first surprising step.

Regard the scalar as the trace of a 1×1 matrix!

This makes it possible to use the identity tr(AB) = tr(BA) whenever A and B are matrices so shaped that both products exist. We get

(so now we are taking the trace of a p×p matrix!)

where

2.3 Using the spectral theorem

It follows from the spectral theoremIn mathematics, particularly linear algebra and functional analysis, the spectral theorem is a collection of results about linear operators or about matrices. In broad terms the spectral theorem provides conditions under which an operator or a matrix can of linear algebraLinear algebra is the branch of mathematics concerned with the study of vectors, vector spaces (or linear spaces), linear transformations, and systems of linear equations. Vector spaces are a central theme in modern mathematics; thus, linear algebra is wi that a positive-definite symmetric matrix S has a unique positive-definite symmetric square root S1/2. We can again use the "cyclic property" of the trace to write

Let B = S1/2 V−1 S1/2. Then the expression above becomes

The positive-definite matrix B can be diagonalized, and then the problem of finding the value of B that maximizes

reduces to the problem of finding the values of the diagonal entries λ1, ..., λp that maximize

This is just a calculus problem and we get λi = n, so that B = n Ip, i.e., n times the p×p identity matrix.



Read more »

Non User