Immediately Locate Any Word Within A Document

Search words, phrases or dates, and automatically get to their location within a document. Full-text search results show where key words are found and how they appear in the context of the document, allowing you to get to the information you want, fast. Avoid cost associated with staff manually re-entering data, and cut manual errors. Data can be shared across applications to streamline communications and workflow

Document Searches That Used to Take Hours Now Take Minutes with OCR

Optical Character Recognition (commonly abbreviated as OCR) is used to convert all or parts of scanned image files to fully searchable text. The image file remains intact and viewable as an original, while all the text is mapped out on the image so that it can be searched, or re-purposed. This is typically used in document management software to improve upon metadata search.

There are many flavours, or options, within the realm of OCR technology, here are some of the most common:

  • Full Text Processing: Commonly used in document management software solutions, particularly in Legal/Compliance, Customer Services, and other departments.
    • Exactly as it implies, full text OCR processing examines and converts all pages within a document, seeking words on any part of the page. These words are then either imported into a searchable database as part of the document management system, or used to create layered output files such as PDF’s with hidden text.
  • Zonal OCR Recognition: A mechanism for extracting fixed portions of a page to be used for metadata (ie, search) population, common across departments.
    • Very useful when working with structured data files where the desired search terms can be easily located in fixed areas of a page. Useful for structured form recognition in high-volume environments.
  • Forms Recognition and Processing: Advanced processing of structured and unstructured data types, most commonly used in Accounts Payable operations with high numbers of invoices, but can be used in various hard-copy, form-dependent document management business processes. Technically two different technologies:
    • Forms Recognition is usually the first technology applied, used as a “classification” method to differentiate one form type from another form type. It can be used when preparing documents for further processing, for example, to make sure that certain invoice formats are correctly routed to the defined invoice processing queues. It can also be used in non-AP environments with much the same effect: getting the documents where they should go next, regardless of whether full forms processing is going to take place. It can greatly simplify document preparation requirements in high-volume mailrooms and other distribution departments.
    • Forms Processing is the in-depth analysis and translation of the information on the form, or invoice, into malleable data that can be linked to other systems, or used to start internal and external workflows and other processes. The most common usage again focuses on the Accounts Payable process, in this case reading the all-important invoice header information, including vendor name, invoice number and date, purchase order information, tax and totals, and more. This can be further enhanced by processing the “line item data” in the invoice body itself: amounts, descriptions, model/order numbers, item amounts, and more. Forms processing gets very granular and can be customised to read and import the information that your organization requires to make better decisions, faster. The processed data can then interact with ERP and main line accounting systems to automate approvals, interchange general ledger codes, and feed downstream processes with valuable data.
  • Intelligent Character Recognition, or ICR is a term typically applied to the recognition of handwriting and is commonly used in education (for test evaluation) and also in marketing companies or departments, for automated recognition of forms and surveys. The technology has come a long way in the last decade, even so far as recognizing script in some cases, but it still works best with inverted-comb or boxed letterboxes.
  • MICR, or Magnetic Ink Character Recognition is used almost exclusively in the banking industry for recognizing the magnetic ink and fonts at the bottom of checks and other financial instruments.
  • Braille and other specialised recognition engines are available for other unique environments.

OCR and the peripheral OCR technologies are now a common practice in document management solutions due to the powerful search opportunity they present, and their ability to cut costs within necessary business processes.

Search for a word or phrase within a scanned document or image with OCR

Easily find files using a word or a phrase that you recall. You can extend that search using technologies such as term weighting (giving more weight to certain surrounding words) and fuzzy logic (used to address mis-spellings and plurals) to make sure you get to the data you need. Your search results can be displayed with context so you can easily sort similar results and get to the data you’re looking for.

When implemented in a centralised or distributed capture environment, or within a mail room environment, OCR technology can be used to generate fully searchable, bookmarked PDFs directly from the scanned images. Or you can use the processed data to feed data mining and analysis technologies to search for trends, research new ideas, or spot items of concern.

Take OCR to the next level with advanced forms recognition and forms processing technologies from Scanfree.

To learn more about the leading document capture software including OCR, visit our PaperVision Capture page. Or read a success story about a global manufacturer using workflow to improve customer service, and improve cash flow.

Contact Us now to learn more.

Top