DIY Book Scanner/OCR
Jump to navigation
Jump to search
![]() | This page may fit the criteria for speedy deletion for this reason:
abandoned, little to no meaningful content Please share your thoughts.
Do you think this page should be kept or doesn't fit the criteria? Want to discuss this with more people? You can continue to edit this page, which may save it from deletion if improved. Administrators: Please check the page history, page log, and especially the last edit, before deleting. |
tesseract OCR software was developed by HP, placed in the open source domain and more recently has been updated by Google. It's free and high quality so worthy of note. It's a command line driven an example .bat file in the windows environment would be:
tesseract image.tif outputbase
to use a white list where digits is the name of the white list
put this in a text file called tessdata/configs/digits:
tessedit_char_whitelist 0123456789
tesseract image.tif outputbase nobatch digits