PDF GLOSSARY

PDF glossary: terms and formats

What every PDF term and format actually means, in plain language. The jargon you run into, explained.

Formats

PDF (Portable Document Format) is a file format that fixes the exact position of every character, line and image on a page, so a document looks identical whether you open it on a phone, a laptop or a print shop's RIP. Adobe created it in 1993 and handed the specification to ISO in 2008, where it became the open standard ISO 32000. That openness is why so many independent tools can read and write PDF without asking anyone for permission.

PDF/A

PDF/A is the ISO 19005 profile built for long-term archiving. The goal is simple: a document opened in fifty years should render exactly as it does today, with no missing fonts and no dependence on external resources that may have vanished. To guarantee that, the standard bans anything that could break over time.

PDF/UA

PDF/UA (ISO 14289, where UA stands for Universal Accessibility) is the standard that makes a PDF usable by people who rely on assistive technology. A screen reader cannot make sense of ink on a page; it needs a logical structure underneath. PDF/UA defines exactly how that structure must be built.

PDF/X

PDF/X (ISO 15930) is the family of profiles made for professional printing and graphic arts. When a file goes to a commercial press, ambiguity is expensive: a missing font, an RGB image where CMYK was expected, or an undefined trim box can ruin an entire print run. PDF/X removes that ambiguity by forcing every print-critical detail to be explicit.

Concepts

OCR

OCR (Optical Character Recognition) turns the picture of text into actual, selectable characters. A scanned page or a photo of a document is, to a computer, just a grid of pixels: there is no text in it, only an image that happens to look like words. OCR analyses the shapes of letters and rebuilds the underlying string of characters.

AcroForm

An AcroForm is PDF's native, built-in form technology, the kind of interactive form that has been part of the format since the late 1990s. The fillable fields you see in a tax return or an application form, text boxes, checkboxes, radio buttons, dropdowns and signature fields, are AcroForm objects defined directly in the PDF's object structure.

XFA

XFA (XML Forms Architecture) is Adobe's alternative form technology, in which the form is defined not by native PDF objects but by an XML payload embedded inside the PDF wrapper. It was designed for complex, dynamic forms: layouts that grow as you add rows, fields that appear or disappear based on earlier answers, and tight binding to back-end data schemas.

Metadata

Metadata is the data about your data, the information a PDF carries beyond the visible page content. There are two main stores: the legacy Document Information Dictionary (title, author, subject, keywords, the software that created it, and creation and modification dates) and XMP, an XML-based block that holds the same fields plus richer, extensible properties.

Compression

Compression is what keeps PDF file sizes manageable, and a single document usually mixes several methods because it mixes several kinds of content. Text and vector drawing instructions compress losslessly with Flate (the same Deflate algorithm behind ZIP), so every character comes back exactly as it went in.

Embedded fonts

Embedded fonts are typefaces packaged inside the PDF itself rather than borrowed from the computer that opens it. This is the feature that makes PDF genuinely portable: if the font travels with the document, the text renders identically everywhere, even on a machine that has never had that typeface installed.

Text layer

The text layer is the part of a PDF that holds real, machine-readable characters, the content you can select with the cursor, copy, search and have read aloud. A PDF built from a word processor or page-layout app has this layer natively, with each character mapped to a position and a font.

Watermark

A watermark is text or an image laid over a PDF's pages to mark status or ownership, a faint "DRAFT" or "CONFIDENTIAL" diagonally across the page, a company logo, or a copyright line. It signals intent without obscuring the underlying content, usually by being semi-transparent or sitting behind the main text.

Linearization (Fast Web View)

Linearization, marketed by Adobe as Fast Web View, is a way of reorganising a PDF's internal byte order so it can be displayed before the whole file has arrived. In a normal PDF the cross-reference table that indexes every object sits at the very end, so a viewer technically needs the complete file to know where things are.

Security

AES encryption

AES (Advanced Encryption Standard) is the block cipher that secures a password-protected PDF. When you lock a document, the page content streams and strings are encrypted with AES, and the only way back to the readable bytes is to supply the right password and derive the correct key. Without it, the file on disk is just ciphertext.

Electronic signature

An electronic signature is, in the broadest legal sense, any data attached to a document that indicates the signer's intent to agree, from a typed name or a drawn squiggle up to a cryptographically backed seal. The EU's eIDAS regulation sorts these into tiers, and the distinction matters when a signature has to stand up later.

Digital signature

A digital signature is the cryptographic mechanism that proves who signed a PDF and that nobody has altered it since. It is the technical engine that the strongest electronic signatures rely on, and it is built from public-key cryptography rather than any picture of a pen stroke.

Images

Vector graphic

Vector graphics describe an image as mathematics, points, lines, curves and fills, rather than as a fixed grid of coloured dots. A circle is stored as a centre, a radius and a colour, so the computer redraws it at whatever size is asked for. The consequence is the defining property of vector art: it scales to any size with no loss of sharpness.

Raster image

A raster image is a rectangular grid of pixels, each holding a colour value, the model behind every photograph and scan. Unlike a vector, a raster has a fixed native resolution: it stores exactly so many dots across and down, and all its detail is baked into that grid.

JPG / JPEG

JPG (also written JPEG, after the Joint Photographic Experts Group that defined it) is the lossy raster format built for photographs. It works by transforming the image into frequency components and discarding the fine detail the human eye is least likely to miss, which is how it squeezes a full-colour photo into a small file.

PNG

PNG (Portable Network Graphics) is the lossless raster format for graphics with sharp edges and flat colour, screenshots, logos, icons, diagrams and anything containing text. Lossless means it stores the image exactly: re-save it as often as you like and not a single pixel changes, the opposite of JPEG's generational decay.

WebP

WebP is an image format from Google that aims to replace both JPEG and PNG with one container. Its trick is supporting two modes: lossy compression for photographs, like JPEG, and lossless compression for graphics, like PNG, while typically producing smaller files than either at comparable quality.

TIFF

TIFF (Tagged Image File Format) is the heavyweight raster format used in archiving, scanning and professional imaging. Its name comes from its structure: a flexible set of tags describing the image, which lets a single TIFF hold uncompressed or losslessly compressed data, high bit depths, embedded colour profiles and a great deal of technical metadata.

SVG

SVG (Scalable Vector Graphics) is an open, XML-based vector format, an image written as readable text describing shapes, paths, colours and text. Because it is vector, it scales to any size with perfectly crisp edges, and because it is XML, it can be styled with CSS, animated, and even searched or edited in a plain text editor.

DPI / PPI

DPI (dots per inch) measures resolution, how many dots of detail are packed into each inch of an image or print. The higher the number, the finer the detail and the larger the file. It is the single setting that most often decides whether a scan or an export looks crisp or disappointing.