Google Releases Tesseract OCT as Open Source(September 7, 2006)
Google released an Optical Character Recognition (OCR) engine into open source. Google is interested in OCR as part of their mission states that they are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing.
The OCR engine, called Tesseract, was not originally developed at Google but at Hewlett Packard Laboratories between 1985 and 1995. Google brushed up the OCR software and released it as open source.