Text layer

The text layer is the part of a PDF that holds real, machine-readable characters, the content you can select with the cursor, copy, search and have read aloud. A PDF built from a word processor or page-layout app has this layer natively, with each character mapped to a position and a font.

The contrast is the scanned PDF, which is often just a picture of a page wrapped in PDF packaging. It looks like a document, but there is no text underneath, so a search finds nothing and selection grabs nothing. OCR is what adds the missing layer: it recognises the characters in the image and writes them back as an invisible text layer aligned to the visible pixels, leaving the page looking the same while making it fully searchable.

Knowing whether a file has a true text layer explains a lot of everyday frustration, why one PDF is searchable and another is not. When you need that layer, generating it locally means the document's words are extracted on your own machine rather than passed to a remote service that could retain them.

Related tools

More terms

OCR AcroForm XFA Metadata Compression Embedded fonts

← Back to the glossary