| • Science | • People | • Locations | • Timeline |
| Contents | ||
It is used in video compression.
A video sequence consists of a number of pictures - usually called frames. Subsequent frames are very similar, thus, containing a lot of redundancy. The goal is to remove the redundancy to gain better compression ratios.
A first approach would be to simply subtract a reference frame from a given frame. The difference is then called residual and usually contains less energy (or information) than the original frame. The residual can be encoded at a lower bit-rate with the same quality. The decoder can reconstruct the original frame by adding the reference frame again.
A more sophisticated approach is to approximate the motion of the whole scene and the objects of a video sequence. The motion is described by some parameters that have to be encoded in the bit-stream. The pixels of the predicted frame are approximated by appropriately translated pixels of the reference frame. This gives much better residuals than a simple subtraction. However, the bit-rate occupied by the parameters of the motion model must not become too big.
Usually, the frames are processed in groups. One frame (usually the first) is encoded without motion compensation just as a normal image. This frame is called I-frame (intra-coded frame, MPEG terminology) or I-picture. The other frames are called P-frames or P-pictures and are predicted from the I-frame or P-frame that comes (temporally) immediately before it. The prediction schemes are, for instance, described as IPPPP, meaning that a group consists of one I-frame followed by four P-frames.
Frames can also be predicted from future frames. The future frames have to be encoded before the predicted frames, of course. Thus, the encoding order does not necessarily match the real frame order. Such frames are usually predicted from two direction, i.e. from the I- or P-frames that immediately precede or follow the predicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB.
In global motion compensation, the motion model basically reflects camera motions such as dolly (forward, backwards), track (left, right), boom (up, down), pan (left, right), tilt (up, down) and roll (along the view axis). It works best for still scenes without moving objects. There are several advantages of global motion compensation:
However, moving objects are not sufficiently represented by global motion compensation. Thus, other methods are preferable.
In block motion compensation (BMC), the frames are partitioned in blocks of pixels (e.g. macroblocks of 16×16 pixels in MPEG). Each block is predicted from a block of equal size in the reference frame. The blocks are not transformed in any way apart from being shifted to the position of the predicted block. This shift is represented by a motion vector.
The motion vectors are the parameters of this motion model and have to be encoded into the bit-stream. As the motion vectors are not independent (e.g. if two neighbouring blocks belong to the same moving object), they are usually encoded differentially to save bit-rate. This means that the difference of the motion vector and the neighbouring motion vector(s) encoded before is encoded. An entropy codec can exploit the resulting statistical distribution of the motion vectors (around the zero vector).
It is possible to shift blocks by non-integer vectors, which is called sub-pixel precision. This is done by interpolating the pixel's values. Usually, the precision of the motion vectors is increased by one bit: half-pixel precision. Of course, the computational expense for sub-pixel precision is much higher since the interpolation functions can be quite consuming.
The big disadvantage of block motion compensation are the discontinuities introduced at block borders (blocking artifacts). They have the form of sharp horizontal and vertical edges which, firstly, are highly visual for the human eye and, secondly, produce ringing effects (big coefficients in high frequency sub-bands) in the Fourier-related transform used for transform coding of the residual frames.