Optical Character Recognition (OCR)

Technology that converts images of text into machine-readable characters.

What Is OCR?

Optical Character Recognition (OCR) is a technology that analyzes an image containing text and converts it into editable, machine-readable characters. When you scan a paper invoice and a computer can "read" the numbers on it, OCR is what made that possible.

OCR has been around since the 1960s and is now embedded in operating systems, scanners, phones, and most document management platforms. Modern OCR engines like Tesseract and those built into Adobe Acrobat achieve high accuracy on clean, typed text in standard fonts.

How OCR Works Technically

OCR processes documents in several stages:

Pre-processing — The image is cleaned up: contrast adjusted, noise removed, pages de-skewed, and resolution normalized. This step has a large impact on accuracy for poor-quality scans.

Layout detection — The engine identifies blocks of text, paragraphs, lines, and individual character bounding boxes.

Character recognition — Each character is compared against a trained model of known character shapes. Modern engines use neural networks (LSTMs) rather than the older template-matching approach.

Post-processing — A language model may be applied to fix common OCR errors (distinguishing "0" from "O", "1" from "l", etc.) based on surrounding context.

OCR's Role in Invoice Processing

OCR is often the first step in an invoice processing pipeline — it converts a scanned PDF or image into raw text that can then be analyzed. Without OCR, a scanned invoice is just pixels; OCR makes the text accessible.

However, raw OCR output is unstructured. It might correctly identify that "249.00" appears on an invoice but have no idea whether that's the subtotal, the tax, or a line item price. OCR gives you characters; it doesn't give you meaning.

OCR Limitations for Invoices

This is the critical limitation: OCR does not understand document structure or context. Every vendor arranges their invoice differently. Some put the vendor name top-left, others top-center. Some label the total "Amount Due," others "Balance," others "Total." OCR alone cannot reliably extract the right field from the right place across diverse vendor formats.

Traditional OCR-based extraction tools work around this with templates — pre-defined rules for each vendor's layout. This requires setup time per vendor and breaks when a vendor changes their invoice design.

How AI Extraction Differs

AI-based invoice extraction uses large vision and language models that understand the semantic meaning of invoice fields regardless of layout. The model reads "Amount Due: $1,249.00" or "Total: $1,249.00" or "Balance Payable: $1,249.00" and correctly identifies all three as the invoice total — no template required.

AI extraction uses OCR as a component (to get the text) but adds a reasoning layer on top that template-based OCR tools lack.