Metadata

Metadata is the data about your data, the information a PDF carries beyond the visible page content. There are two main stores: the legacy Document Information Dictionary (title, author, subject, keywords, the software that created it, and creation and modification dates) and XMP, an XML-based block that holds the same fields plus richer, extensible properties.

Most of this is invisible when you read the document, which is exactly why it surprises people. A PDF exported from an office suite often embeds the author's real name and the original filename; a file generated from a scan may record the device used. None of that appears on the page, yet anyone who inspects the file can read it. For anyone publishing documents, that hidden trail can leak more than intended.

Stripping or editing metadata before you share a file is a basic privacy hygiene step, and like the document itself, it is best done where the file already lives, on your own machine, so no extra copy is created in the process. Note that good metadata also has value: it makes documents searchable and keeps archives organised, so the goal is control over it, not blind deletion.

Related tools

More terms

OCR AcroForm XFA Compression Embedded fonts Text layer

← Back to the glossary