Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Optical character recognition


 

Optical character recognition, usually abbreviated to OCR, involves computer systems designed to translate images of typewritten text (usually captured by a scanner) into machine-editable text--to translate pictures of characters into a standard encoding scheme representing them ( ASCII or Unicode). OCR began as a field of research in artificial intelligence and machine vision; though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques.

1 Optical vs. Digital Character Recognition

Originally, the distinction between optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were considered separate fields. Since very few applications survive that use true optical techniques the optical character recognition term has now been broadened to cover digital character recognition as well.

2 Training

Early systems required "training" (essentially, the provision of known samples of each character) to read a specific font. Currently, though, "intelligent" systems that can recognize most fonts with a high degree of accuracy are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.


3 Brief history of OCR

The United States Postal Service has been using OCR machines to sort mail since 1965. Mail sorting now plays a small role in OCR research; OCR systems need only read the postal code on each envelope. After the postal code has been read, a bar code with the same information can be printed on the envelope. To avoid interference with the human-readable address field which can be located anywhere on the letter, special ink is used that is clearly visible under UV light. This ink looks orange in normal lighting conditions. Envelopes marked with the machine readable bar code may then be processed; machine-readable codes can be decoded more quickly than human readable letters and numbers.

4 Typewritten OCR

While the accurate recognition of Latin-scriptThe Latin alphabet also called the Roman alphabet is the most widely used alphabetic writing system in the world, the standard script of the English language and most of the languages of western and central Europe, and of those areas settled by Europeans. typewritten text is now considered largely a solved problem, recognition of hand printing and handwriting in general, and printed versions of some other scripts--particularly those with a very large number of characters--are still the subject of active research.

5 Hand print OCR

Systems for recognizing hand-printed textHandwriting recognition refers to the ability of a computer to receive intelligible written input. The image of the written text may be sensed "off line" from a piece of paper by optical scanning ( optical character recognition). Alternatively, the moveme on the fly have enjoyed commercial success in recent years. Among these are the input device for the Palm PilotPalm Pilot was the name given to several early models of personal digital assistant manufactured by Palm, Inc. when it was a subsidiary of U. Robotics or 3Com). More recent models of PDA manufactured by Palm are not named Pilots due to name infringement l and other Personal Digital Assistants. The Apple NewtonNewton was one of the world's first personal digital assistants (PDA). Developed by Apple Computer and sold from 1993 to 1998, it was based on the ARM processor, and featured handwriting recognition. Apple's official name for the device was MessagePad the pioneered this technology. The algorithms used in these devices take advantage of the fact that the order, speed, and direction of individual lines segments at input are known. Also, the user can be retrained to use only specific letter shapes. These methods cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80%-90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited contexts.



Read more »

Non User