Document Lexicon.

Master the vocabulary of PDF forensics, archival standards, and secure document protocols.

AES-256 (Advanced Encryption Standard)

Security

The global standard for data encryption, using a 256-bit key to transform PDF data into unreadable noise, essentially unbreakable by brute force.

Annotate

Interaction

The process of adding supplementary information (comments, highlights, shapes) to a PDF without changing the underlying static content.

Bicubic Downsampling

Technical

An image processing technique that reduces the resolution of an image to shrink PDF file size while maintaining high visual fidelity.

CMYK (Cyan, Magenta, Yellow, Key/Black)

Printing

The subtractive color model used in color printing. Unlike RGB (light), CMYK is optimized for physical ink on paper.

Cross-Reference Table (XRef)

Architecture

The internal index of a PDF that maps every object to its exact byte location in the file. A corrupt XRef is the most common cause of 'Broken' PDFs.

Digital Signature

Security

A cryptographic 'seal' placed on a PDF that ensures the document hasn't been modified since it was signed and identifies the signer via a private key.

DPI (Dots Per Inch)

Printing

A measure of spatial printing or video dot density. Higher DPI results in sharper images but larger PDF file sizes.

Embedding (Fonts)

Technical

The process of including full font data inside the PDF file, ensuring the document looks the same on computers that don't have that font installed.

Flattening

Interaction

The process of merging multiple layers of a PDF (like checkboxes and text fields) into a single, static image layer that cannot be edited.

Grayscale

Printing

A range of shades of gray from white to black, used in PDFs to reduce color ink usage and decrease file size.

ICC Profile

Printing

A set of data that characterizes a color input or output device, used to ensure color consistency across different screens and printers.

JBIG2

Technical

A specialized bi-level image compression standard used in PDFs primarily to shrink the size of scanned black-and-white documents.

Linearization (Fast Web View)

An optimization that organizes a PDF file so the first page can be displayed in a browser while the rest of the file continues to download.

Lossless Compression

Technical

A method of reducing file size that allows the original data to be perfectly reconstructed without any loss of quality.

Lossy Compression

Technical

A method of reducing file size by discarding information that is less perceptible to the human eye, resulting in a permanent loss of detail.

Metadata

Architecture

Invisible data stored inside a PDF, such as the author's name, creation date, and software version used to generate the file.

OCR (Optical Character Recognition)

The use of AI to convert an image of text (like a scan) into searchable and selectable digital text.

PDF/A (Archival)

Standards

An ISO-standardized version of PDF specialized for use in the archiving and long-term preservation of digital documents.

PDF/X (Exchange)

Standards

A subset of the PDF standard specialized for graphics exchange in print production, ensuring high-fidelity color and font consistency.

PKI (Public Key Infrastructure)

Security

The framework of roles and policies used to create, manage, and revoke digital certificates used for PDF signing and encryption.

Portability

Standards

The ability of the PDF format to maintain its visual appearance regardless of the operating system, device, or software used to open it.

Rasterization

Technical

The process of converting vector graphics (perfect lines) into a grid of pixels. Scaling a rasterized image results in pixelation.

Redaction

Security

The permanent and irreversible deletion of sensitive information from a PDF. Unlike 'blacking out,' true redaction wipes the underlying data.

User Password

Security

A security credential required to open and view a PDF, which triggers the file's primary encryption mechanism.

Vector Graphics

Architecture

Images defined by mathematical points and paths rather than pixels, allowing for infinite scalability without loss of clarity.

AcroForms

Architecture

The original PDF form technology that uses fixed-position fields and key-value pairs to collect user data within a static layout.

Bleed Box

Printing

A technical boundary in a PDF that defines the area to which images and background colors should extend to ensure no white edges remain after physical trimming.

Bookmark (Outline)

A navigational tool in a PDF that provides a hierarchical table of contents, allowing users to jump to specific sections without scrolling.

Canvas

Architecture

The virtual drawing area in a browser where PDF pages are rendered using JavaScript and WebAssembly during client-side processing.

Certificate Authority (CA)

Security

A trusted entity that issues digital certificates, verifying the identity of the person or organization that signed a PDF.

Conformance Level

Standards

A measure of how strictly a PDF adheres to a specific ISO standard, such as PDF/A-1a (Accessible) vs PDF/A-1b (Basic).

Cos Layer

Architecture

The low-level 'Carousel Object System' that defines the basic syntax of a PDF, including arrays, dictionaries, and streams.

Crop Box

The boundary that defines the visible region of a PDF page when displayed in a viewer. Data outside this box is hidden but still exists in the file.

Destructive Edit

Technical

An edit that permanently removes or alters the original binary data of a PDF, such as true redaction or image downsampling.

Document Object Model (DOM)

Architecture

The structural representation of a web page that our client-side tools interact with to display PDF assets and capture user input.

Encryption Dictionary

Security

A hidden section of a PDF that contains the parameters used to protect the file, including the algorithm, iteration count, and salt.

File Trailer

Architecture

The final section of a PDF file that points to the Cross-Reference Table and provides the unique file identifier (ID).

Forms Data Format (FDF)

Standards

A lightweight file format used specifically for exporting and importing data from PDF forms without sending the entire document.

Hough Transform

A feature extraction technique used in OCR to detect straight lines, helping the engine 'Deskew' or straighten a crooked scan.

Incremental Update

Architecture

A method where changes are appended to the end of a PDF rather than rewriting the whole file, often used to preserve digital signatures.

Interpolation

Technical

A mathematical method for creating new data points between known values, used to smooth out pixelated images when scaling a PDF.

Key Pair

Security

A set of two related cryptographic keys (Public and Private) used to sign and verify the authenticity of a digital document.

Logical Structure

Standards

The hierarchy of tags in a PDF (H1, P, Table) that allows screen readers to interpret the content in a meaningful order.

LZA Compression

Technical

A lossless data compression algorithm used in early PDF versions, largely replaced by Flate (Zlib) in modern specifications.

Media Box

Printing

The largest boundary in a PDF, defining the physical dimensions of the 'paper' on which the document is conceptually printed.

Module Thinning

Printing

A technical adjustment in QR and Barcode generation that slightly shrinks the pattern to compensate for the 'ink bleed' that occurs during printing.

Owner Password

Security

A master password that grants permission to modify security settings, print, or extract text from a restricted PDF.

PDF/E (Engineering)

Standards

An ISO-standardized version of PDF designed for engineering and technical documentation, typically including 3D model support.

PostScript

Architecture

The page description language developed by Adobe that served as the architectural foundation for the PDF format.

Preflight

Printing

The process of checking a PDF for errors, missing fonts, or low-res images before it is sent to a professional printer.

Sanitization

Security

The process of removing hidden data, comments, and metadata from a PDF to ensure it is safe for public distribution.

Subset (Fonts)

Technical

Embedding only the specific characters used in a document rather than the entire font family, significantly reducing file size.

Tesseract.js

The JavaScript implementation of the world's most popular OCR engine, allowing us to perform text recognition entirely in your browser.

Trim Box

Printing

The boundary that indicates the intended dimensions of the finished page after it has been cut from a larger sheet of paper.

Unicode

Standards

The universal encoding standard that assigns a unique number to every character, ensuring text in a PDF is searchable across all languages.

WebAssembly (WASM)

Architecture

A low-level assembly-like language that allows us to run high-performance C++ PDF engines at near-native speed in your browser.

XMP (Extensible Metadata Platform)

Architecture

An ISO standard for embedding metadata in files, allowing for advanced tracking of authoring and modification history inside a PDF.

ZUGFeRD

Standards

A European standard for electronic invoicing that embeds an XML data file inside a PDF/A-3 document for automatic machine processing.