Document Lexicon.
Master the vocabulary of PDF forensics, archival standards, and secure document protocols.
AES-256 (Advanced Encryption Standard)
SecurityThe global standard for data encryption, using a 256-bit key to transform PDF data into unreadable noise, essentially unbreakable by brute force.
Annotate
InteractionThe process of adding supplementary information (comments, highlights, shapes) to a PDF without changing the underlying static content.
Bicubic Downsampling
TechnicalAn image processing technique that reduces the resolution of an image to shrink PDF file size while maintaining high visual fidelity.
CMYK (Cyan, Magenta, Yellow, Key/Black)
PrintingThe subtractive color model used in color printing. Unlike RGB (light), CMYK is optimized for physical ink on paper.
Cross-Reference Table (XRef)
ArchitectureThe internal index of a PDF that maps every object to its exact byte location in the file. A corrupt XRef is the most common cause of 'Broken' PDFs.
Digital Signature
SecurityA cryptographic 'seal' placed on a PDF that ensures the document hasn't been modified since it was signed and identifies the signer via a private key.
DPI (Dots Per Inch)
PrintingA measure of spatial printing or video dot density. Higher DPI results in sharper images but larger PDF file sizes.
Embedding (Fonts)
TechnicalThe process of including full font data inside the PDF file, ensuring the document looks the same on computers that don't have that font installed.
Flattening
InteractionThe process of merging multiple layers of a PDF (like checkboxes and text fields) into a single, static image layer that cannot be edited.
Grayscale
PrintingA range of shades of gray from white to black, used in PDFs to reduce color ink usage and decrease file size.
ICC Profile
PrintingA set of data that characterizes a color input or output device, used to ensure color consistency across different screens and printers.
JBIG2
TechnicalA specialized bi-level image compression standard used in PDFs primarily to shrink the size of scanned black-and-white documents.
Linearization (Fast Web View)
UXAn optimization that organizes a PDF file so the first page can be displayed in a browser while the rest of the file continues to download.
Lossless Compression
TechnicalA method of reducing file size that allows the original data to be perfectly reconstructed without any loss of quality.
Lossy Compression
TechnicalA method of reducing file size by discarding information that is less perceptible to the human eye, resulting in a permanent loss of detail.
Metadata
ArchitectureInvisible data stored inside a PDF, such as the author's name, creation date, and software version used to generate the file.
OCR (Optical Character Recognition)
AIThe use of AI to convert an image of text (like a scan) into searchable and selectable digital text.
PDF/A (Archival)
StandardsAn ISO-standardized version of PDF specialized for use in the archiving and long-term preservation of digital documents.
PDF/X (Exchange)
StandardsA subset of the PDF standard specialized for graphics exchange in print production, ensuring high-fidelity color and font consistency.
PKI (Public Key Infrastructure)
SecurityThe framework of roles and policies used to create, manage, and revoke digital certificates used for PDF signing and encryption.
Portability
StandardsThe ability of the PDF format to maintain its visual appearance regardless of the operating system, device, or software used to open it.
Rasterization
TechnicalThe process of converting vector graphics (perfect lines) into a grid of pixels. Scaling a rasterized image results in pixelation.
Redaction
SecurityThe permanent and irreversible deletion of sensitive information from a PDF. Unlike 'blacking out,' true redaction wipes the underlying data.
User Password
SecurityA security credential required to open and view a PDF, which triggers the file's primary encryption mechanism.
Vector Graphics
ArchitectureImages defined by mathematical points and paths rather than pixels, allowing for infinite scalability without loss of clarity.
AcroForms
ArchitectureThe original PDF form technology that uses fixed-position fields and key-value pairs to collect user data within a static layout.
Bleed Box
PrintingA technical boundary in a PDF that defines the area to which images and background colors should extend to ensure no white edges remain after physical trimming.
Bookmark (Outline)
UXA navigational tool in a PDF that provides a hierarchical table of contents, allowing users to jump to specific sections without scrolling.
Canvas
ArchitectureThe virtual drawing area in a browser where PDF pages are rendered using JavaScript and WebAssembly during client-side processing.
Certificate Authority (CA)
SecurityA trusted entity that issues digital certificates, verifying the identity of the person or organization that signed a PDF.
Conformance Level
StandardsA measure of how strictly a PDF adheres to a specific ISO standard, such as PDF/A-1a (Accessible) vs PDF/A-1b (Basic).
Cos Layer
ArchitectureThe low-level 'Carousel Object System' that defines the basic syntax of a PDF, including arrays, dictionaries, and streams.
Crop Box
UXThe boundary that defines the visible region of a PDF page when displayed in a viewer. Data outside this box is hidden but still exists in the file.
Destructive Edit
TechnicalAn edit that permanently removes or alters the original binary data of a PDF, such as true redaction or image downsampling.
Document Object Model (DOM)
ArchitectureThe structural representation of a web page that our client-side tools interact with to display PDF assets and capture user input.
Encryption Dictionary
SecurityA hidden section of a PDF that contains the parameters used to protect the file, including the algorithm, iteration count, and salt.
File Trailer
ArchitectureThe final section of a PDF file that points to the Cross-Reference Table and provides the unique file identifier (ID).
Forms Data Format (FDF)
StandardsA lightweight file format used specifically for exporting and importing data from PDF forms without sending the entire document.
Hough Transform
AIA feature extraction technique used in OCR to detect straight lines, helping the engine 'Deskew' or straighten a crooked scan.
Incremental Update
ArchitectureA method where changes are appended to the end of a PDF rather than rewriting the whole file, often used to preserve digital signatures.
Interpolation
TechnicalA mathematical method for creating new data points between known values, used to smooth out pixelated images when scaling a PDF.
Key Pair
SecurityA set of two related cryptographic keys (Public and Private) used to sign and verify the authenticity of a digital document.
Logical Structure
StandardsThe hierarchy of tags in a PDF (H1, P, Table) that allows screen readers to interpret the content in a meaningful order.
LZA Compression
TechnicalA lossless data compression algorithm used in early PDF versions, largely replaced by Flate (Zlib) in modern specifications.
Media Box
PrintingThe largest boundary in a PDF, defining the physical dimensions of the 'paper' on which the document is conceptually printed.
Module Thinning
PrintingA technical adjustment in QR and Barcode generation that slightly shrinks the pattern to compensate for the 'ink bleed' that occurs during printing.
Owner Password
SecurityA master password that grants permission to modify security settings, print, or extract text from a restricted PDF.
PDF/E (Engineering)
StandardsAn ISO-standardized version of PDF designed for engineering and technical documentation, typically including 3D model support.
PostScript
ArchitectureThe page description language developed by Adobe that served as the architectural foundation for the PDF format.
Preflight
PrintingThe process of checking a PDF for errors, missing fonts, or low-res images before it is sent to a professional printer.
Sanitization
SecurityThe process of removing hidden data, comments, and metadata from a PDF to ensure it is safe for public distribution.
Subset (Fonts)
TechnicalEmbedding only the specific characters used in a document rather than the entire font family, significantly reducing file size.
Tesseract.js
AIThe JavaScript implementation of the world's most popular OCR engine, allowing us to perform text recognition entirely in your browser.
Trim Box
PrintingThe boundary that indicates the intended dimensions of the finished page after it has been cut from a larger sheet of paper.
Unicode
StandardsThe universal encoding standard that assigns a unique number to every character, ensuring text in a PDF is searchable across all languages.
WebAssembly (WASM)
ArchitectureA low-level assembly-like language that allows us to run high-performance C++ PDF engines at near-native speed in your browser.
XMP (Extensible Metadata Platform)
ArchitectureAn ISO standard for embedding metadata in files, allowing for advanced tracking of authoring and modification history inside a PDF.
ZUGFeRD
StandardsA European standard for electronic invoicing that embeds an XML data file inside a PDF/A-3 document for automatic machine processing.