| • Science | • People | • Locations | • Timeline |
A range encoder is an entropy coder that is believed to approach the compression ratio of arithmetic coding, without the patent encumbrances of arithmetic coding. Like arithmetic coding, range encoding conceptually encodes all the symbols of the message into one number, unlike Huffman coding which assigns each symbol a variable-length bit pattern. Thus range encoding can achieve greater compression ratios than the one-bit-per-symbol upper bound on Huffman encoding and it does not suffer the inefficiencies that Huffman does when dealing with probabilities that are not exact powers of two.
The central concept behind range encoding is this: given a large-enough range of integers, and a frequency algorithm that returns the probability of a given symbol in a given context, the initial range can easily be divided into sub-ranges whose sizes are proportional to the probability of the symbol they represent. Each symbol of the message can then be encoded in turn, by reducing the current range down to just that sub-range which corresponds to the next symbol to be encoded.
When all symbols have been encoded, merely identifying the sub-range is enough to communicate the entire message (presuming of course that the decoder is somehow notified when it has extracted the entire message). A single integer is actually sufficient to identify the sub-range, and it may not even be necessary to transmit the entire integer; if there is a sequence of digits such that every integer beginning with that prefix falls within the sub-range, then the prefix alone is all that's needed to identify the sub-range and thus transmit the message.
Suppose we want to encode the message "AABA
Since our first symbol is an A, it reduces our initial range down to
With two symbols encoded, our range is now
This time it is the second of our three choices that represent the message we want to encode, and our range becomes
Finally, with our range narrowed down to
And since
The central problem may appear to be selecting an initial range large enough that no matter how many symbols we have to encode, we will always have a current range large enough to divide into non-zero sub-ranges. In practice, however, this is not a problem, because instead of starting with a very large range and gradually narrowing it down, the encoder works with a smaller range of numbers at any given time. As more symbols are encoded, the upper digits of the final result become established (notice that after encoding just three symbols, we already knew that our final result would start with "2"); we can transmit these digits to the decoder or commit them to storage, and add digits to replace them on the right, making our range large enough once again that it can be split adequately into sub-ranges.