[pending revision] | [accepted revision] |
Content deleted Content added
...... Tags: Reverted references removed Visual edit Mobile edit Mobile web edit |
no point in using {{dead link}} when archive is already provided; marking these as unfit was the right call to avoid linking to spam site |
||
(76 intermediate revisions by 41 users not shown) | |||
Line 1:
{{Short description|Portable Document Format, a digital file format}}
{{Other uses}}
{{pp-pc}}
{{Use American English|date=January 2023}}
{{Use mdy dates|date=January 2023}}
{{Infobox file format
| name = Portable Document Format
| icon = PDF_file_icon.svg
| icon_size = 121px
| iconcaption = Adobe PDF icon
| screenshot =
| extension = <code>.pdf</code>
| _noextcode = no
| mime = {{Plainlist|
* <code>application/pdf</code>,<ref name="rfc8118">{{Cite IETF |title=The application/pdf Media Type |rfc=8118 |sectionname= |section= |page= |last1=Hardy|first1=M.|last2=Masinter|first2=L.|last3=Markovic|first3=D.|last4=Johnson|first4=D.|last5=Bailey|first5=M.|date=March 2017|publisher=[[Internet Engineering Task Force|IETF]]|doi=10.17487/RFC8118 }}</ref>
* <code>application/x-pdf</code>
* <code>application/x-bzpdf</code>
* <code>application/x-gzpdf</code>
}}
| _nomimecode = true
| uniform type = com.adobe.pdf
| magic = <code>%PDF</code>
| owner = [[Adobe Inc.]] (1991–2008)<br />
[[ISO]] (2008–)
| genre =
| released = {{Start date and age|1993|6|15}}
| latest release version = 2.0
| latest release date = <!-- {{Start date and age|YYYY|mm|dd}} -->
| container for =
| contained by =
| extended from =
| extended to = [[PDF/A]], [[PDF/E]], [[PDF/UA]], [[PDF/VT]], [[PDF/X]]
| standard = ISO 32000-2
| open = Yes
| url = {{URL|https://iso.org/standard/75839.html}}
| image =
| typecode = <code>PDF </code><ref name="rfc8118" /> (including a single trailing space)
}}
'''Portable Document Format''' ('''PDF'''), standardized as '''ISO 32000''', is a [[file format]] developed by [[Adobe Inc.|Adobe]] in 1992 to present [[document]]s, including text formatting and images, in a manner independent of [[application software]], [[Computer hardware|hardware]], and [[operating system]]s.<ref name="pdf-ref-1.7">{{cite web|author=Adobe Systems Incorporated|url=https://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf|title=PDF Reference|date=November 2006|edition=6th|version=1.7|url-status=dead|archiveurl=https://web.archive.org/web/20081001170454/https://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf|archivedate=October 1, 2008|accessdate=January 12, 2023}}</ref><ref>{{Cite web|last=Warnock|first=J.|url=https://www.pdfa.org/norm-refs/warnock_camelot.pdf|title=The Camelot Project|date=14 October 2004<!--dates from PDF source-->|orig-date=Original date 5 May 1995|archive-url=https://web.archive.org/web/20110718230852/http://www.planetpdf.com/planetpdf/pdfs/warnock_camelot.pdf|archive-date=July 18, 2011|url-status=live}}</ref> Based on the [[PostScript]] language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, [[font]]s, [[vector graphics]], [[raster images]] and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder [[John Warnock]] in 1991.<ref>{{Cite web|title=What is a PDF? Portable Document Format {{!}} Adobe Acrobat DC|url=https://www.adobe.com/acrobat/about-adobe-pdf.html|access-date=January 12, 2023|publisher=Adobe Systems Inc.|language=en|archive-date=January 30, 2023|archive-url=https://web.archive.org/web/20230130032548/https://www.adobe.com/acrobat/about-adobe-pdf.html|url-status=live}}</ref>
PDF was standardized as ISO 32000 in 2008.<ref>{{cite web |url = http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf |title = ISO 32000-1:2008 |archive-url=https://web.archive.org/web/20180726064724/http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf | archive-date=July 26, 2018|url-status=dead}}</ref> The last edition as ISO 32000-2:2020 was published in December 2020.
PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, [[rich media]] (including video content), three-dimensional objects using [[U3D]] or [[PRC (file format)|PRC]], and various other [[File format|data formats]]. The PDF specification also provides for encryption and [[digital signature]]s, file attachments, and [[metadata]] to enable [[workflow]]s requiring these features.
== History ==
{{Main|History of PDF}}
The development of PDF began in 1991 when [[John Warnock]] wrote a paper for a project then code-named Camelot, in which he proposed the creation of a simplified version of PostScript called Interchange PostScript (IPS).<ref name="Pfiffner_Page_137">{{cite book |last1=Pfiffner |first1=Pamela |title=Inside the Publishing Revolution: The Adobe Story |date=2003 |publisher=Peachpit Press |location=Berkeley |isbn=0-321-11564-3 |page=137}}</ref> Unlike traditional PostScript, which was tightly focused on rendering [[print job]]s to output devices, IPS would be optimized for displaying pages to any screen and any platform.<ref name="Pfiffner_Page_137" />
[[Adobe Systems]] made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in [[desktop publishing]] workflows, and competed with several other formats, including [[DjVu]], [[Envoy (WordPerfect)|Envoy]], Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript format.
Line 30 ⟶ 56:
ISO published ISO 32000-2 in 2017, available for purchase, replacing the free specification provided by Adobe.<ref name=nowfree/> In December 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, with clarifications, corrections, and critical updates to normative references<ref>{{Cite web |url=https://www.pdfa.org/iso-32000-22020-is-now-available/ |title=ISO 32000-2:2020 is now available |publisher=PDFA |date=December 14, 2020 |access-date=February 3, 2021 |archive-date=December 4, 2022 |archive-url=https://web.archive.org/web/20221204112238/https://www.pdfa.org/iso-32000-22020-is-now-available/ |url-status=live }}</ref> (ISO 32000-2 does not include any proprietary technologies as normative references).<ref name=":0">{{cite web|url=https://www.iso.org/standard/75839.html|title=ISO 32000-2 – Document management — Portable document format — Part 2: PDF 2.0|date=January 5, 2021 |publisher=ISO|access-date=February 3, 2021|archive-date=January 28, 2021|archive-url=https://web.archive.org/web/20210128003836/https://www.iso.org/standard/75839.html|url-status=live}}</ref>
In April 2023 the PDF Association made ISO 32000-2 available for download free of charge.<ref name=nowfree>{{cite press release| title=Announcing no-cost access to the latest PDF standard: ISO 32000-2 (PDF 2.0)| publisher=PDF Association| url=https://pdfa.org/sponsored-standards
==
A PDF file is often a combination of [[vector graphics]], text, and [[bitmap graphics]]. The basic types of content in a PDF are:
Line 38 ⟶ 64:
* Typeset text stored as content streams (i.e., not encoded in [[plain text]]);
* Vector graphics for illustrations and designs that consist of shapes and lines;
* Raster graphics for photographs and other types of images; and
*
In later PDF revisions, a PDF document can also support links (inside document or web page), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded contents that can be handled using plug-ins.
Line 50 ⟶ 76:
=== PostScript language ===
[[PostScript]] is a [[page description language]] run in an [[Interpreter (computing)|interpreter]] to generate an image.<ref name="Pfiffner_Page_137" /> It can handle graphics and has standard features of [[programming language]]s such as [[conditional (computer programming)|branching]] and [[loop (computing)|looping]].<ref name="Pfiffner_Page_137" /> PDF is a subset of PostScript, simplified to remove such
PostScript was originally designed for a drastically different [[use case]]: transmission of one-way linear print jobs in which the PostScript interpreter would collect a series of commands until it encountered the <code>showpage</code> command, then execute all the commands to render a page as a raster image to a printing device.<ref name="Pfiffner_Page_139">{{cite book |last1=Pfiffner |first1=Pamela |title=Inside the Publishing Revolution: The Adobe Story |date=2003 |publisher=Peachpit Press |location=Berkeley |isbn=0-321-11564-3 |page=139}}</ref> PostScript was not intended for long-term storage and real-time interactive rendering of [[electronic document]]s to [[computer monitor]]s, so there was no need to support anything other than consecutive rendering of pages.<ref name="Pfiffner_Page_139" /> If there was an error in the final printed output, the user would correct it at the application level and send a new print job in the form of an entirely new PostScript file. Thus, any given page in a PostScript file could be accurately rendered only as the cumulative result of executing all preceding commands to draw all previous pages—any of which could affect subsequent pages—plus the commands to draw that particular page, and there was no easy way to bypass that process to skip around to different pages.<ref name="Pfiffner_Page_139" />
As a document format, PDF has several advantages over PostScript:
* PDF contains only static [[Declarative programming|declarative]] PostScript code
* Like [[Display PostScript]],
* PDF enforces the rule that the code for
* All data required for rendering is included
Its disadvantages are:
*
* A (sometimes much) larger file size.<ref>{{cite web |last1=Anton Ertl |first1=Martin |title=What is the PDF format good for? |url=https://www.complang.tuwien.ac.at/anton/why-not-pdf.html |website=complang.tuwien.ac.at |publisher=Vienna University of Technology |access-date=8 April 2024|archive-url=https://web.archive.org/web/20240404031526/https://www.complang.tuwien.ac.at/anton/why-not-pdf.html|archive-date=4 April 2024|url-status=live}}</ref>
PDF since v1.6 supports embedding of interactive 3D documents: 3D drawings can be embedded using [[U3D]] or [[PRC (file format)|PRC]] and various other data formats.<ref name="3d#1">{{cite web |url=https://www.adobe.com/manufacturing/resources/3dformats/ |title=3D supported formats |publisher=Adobe Systems Inc. |date=July 14, 2009 |access-date=February 21, 2010
Line 87 ⟶ 115:
Objects may be either ''direct'' (embedded in another object) or ''indirect''. Indirect objects are numbered with an ''object number'' and a ''generation number'' and defined between the <code>obj</code> and <code>endobj</code> keywords if residing in the document root. Beginning with PDF version 1.5, indirect objects (except other streams) may also be located in special streams known as ''object streams'' (marked <code>/Type /ObjStm</code>). This technique enables non-stream objects to have standard stream filters applied to them, reduces the size of files that have large numbers of small indirect objects and is especially useful for ''Tagged PDF''. Object streams do not support specifying an object's ''generation number'' (other than 0).
An index table, also called the cross-reference table, is located near the end of the file and gives the byte offset of each indirect object from the start of the file.<ref>Adobe Systems, PDF Reference, pp. 39–40.</ref> This design allows for efficient [[random access]] to the objects in the file, and also allows for small changes to be made without rewriting the entire file (''incremental update''). Before PDF version 1.5, the table would always be in a special ASCII format, be marked with the <code>xref</code> keyword, and follow the main body composed of indirect objects. Version 1.5 introduced optional ''cross-reference streams'', which have the form of a standard stream object, possibly with filters applied. Such a stream may be used instead of the ASCII cross-reference table and contains the offsets and other information in binary format. The format is flexible in that it allows for integer width specification (using the <code>/W</code> array), so that for example, a document not exceeding 64 [[KiB]] in size may dedicate only 2
At the end of a PDF file is a footer containing
Line 102 ⟶ 130:
There are two layouts to the PDF files: non-linearized (not "optimized") and linearized ("optimized"). Non-linearized PDF files can be smaller than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linearized PDF files (also called "optimized" or "web optimized" PDF files) are constructed in a manner that enables them to be read in a Web browser plugin without waiting for the entire file to download, since all objects required for the first page to display are optimally organized at the start of the file.<ref name="pdf-ref">{{cite web|url=https://www.adobe.com/devnet/pdf/pdf_reference.html|title=Adobe Developer Connection: PDF Reference and Adobe Extensions to the PDF Specification|publisher=Adobe Systems Inc.|access-date=December 13, 2010|url-status=dead|archive-url=https://web.archive.org/web/20061115132507/https://www.adobe.com/devnet/pdf/pdf_reference.html|archive-date=November 15, 2006}}</ref> PDF files may be optimized using [[Adobe Acrobat]] software or [[QPDF]].
Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in<sup>2</sup> (145,161
== Imaging model ==
Line 120 ⟶ 148:
=== Raster images ===
* ''ASCII85Decode'', a filter used to put the stream into 7-bit ASCII,
Line 166 ⟶ 194:
== Additional features ==
=== Logical structure and accessibility<span class="anchor" id="Tagged PDF"></span> ===
{{See also|PDF/A-1|PDF/UA}}
A
Tagged PDF is not required in situations where a PDF file is intended only for print. Since the feature is optional, and since the rules for
An ISO-standardized subset of PDF specifically targeted at accessibility, [[PDF/UA]], was first published in 2012.
Line 234 ⟶ 263:
=== Forms ===
''Interactive Forms'' is a mechanism to add forms to the PDF file format. PDF currently supports two different methods for integrating data and PDF forms. Both formats today coexist in the PDF specification:<ref name="iso32000">{{cite web |url=https://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf |title=Document Management – Portable Document Format – Part 1: PDF 1.7, First Edition |author=Adobe Systems Inc.|date=July 1, 2008|access-date=January 12, 2023|url-status=dead|archive-url=https://web.archive.org/web/20081203002256/https://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf|archive-date=December 3, 2008}}</ref><ref>{{cite web |url=http://gnupdf.org/Forms_Data_Format |title=Gnu PDF – PDF Knowledge – Forms Data Format |archive-url=https://web.archive.org/web/20130101054615/http://www.gnupdf.org/Forms_Data_Format |archive-date=January 1, 2013 |access-date=January 12, 2023|url-status=
* AcroForms (also known as Acrobat forms), introduced in the PDF 1.2 format specification and included in all later PDF specifications.
Line 264 ⟶ 293:
===Malware vulnerability===
PDF files can be infected with viruses, Trojans, and other malware. They can have hidden JavaScript code that might exploit vulnerabilities in a PDF, hidden objects executed when the file that hides them is opened, and, less commonly, a malicious PDF can launch malware.<ref>{{cite web | title=Can PDFs have viruses? Keep your files safe | publisher=Adobe | url=https://www.adobe.com/acrobat/resources/can-pdfs-contain-viruses.html | access-date=3 October 2023 | archive-date=October 4, 2023 | archive-url=https://web.archive.org/web/20231004120143/https://www.adobe.com/acrobat/resources/can-pdfs-contain-viruses.html | url-status=live }}</ref>
PDF attachments carrying viruses were first discovered in 2001. The virus, named ''OUTLOOK.PDFWorm'' or ''Peachy'', uses [[Microsoft Outlook]] to send itself as an attached Adobe PDF file. It was activated with Adobe Acrobat, but not with Acrobat Reader.<ref>Adobe Forums, [https://forums.adobe.com/thread/302989 Announcement: PDF Attachment Virus "Peachy"] {{Webarchive|url=https://web.archive.org/web/20150904151955/https://forums.adobe.com/thread/302989 |date=September 4, 2015 }}, August 15, 2001.</ref>
Line 279 ⟶ 308:
Many PDF viewers are provided free of charge from a variety of sources. Programs to manipulate and edit PDF files are available, usually for purchase.
There are many software options for creating PDFs, including the PDF printing capabilities built into [[macOS]], [[iOS]],<ref>{{Cite web|url=https://ijunkie.com/how-to-create-pdf-web-page-safari-iphone-ipad-ios-11/|title=How to Create a PDF from Web Page on iPhone and iPad in iOS 11|last=Pathak|first=Khamosh|date=October 7, 2017|website=iJunkie|access-date=January 12, 2023|archive-date=January 12, 2023|archive-url=https://web.archive.org/web/20230112153246/https://ijunkie.com/how-to-create-pdf-web-page-safari-iphone-ipad-ios-11/|url-status=live}}</ref> and most [[Linux]] distributions. Much document processing software including
The [[Free Software Foundation]]
The [[Apache PDFBox]] project of the [[Apache Software Foundation]] is an open source Java library, licensed under the [[Apache License]], for working with PDF documents.<ref>{{cite web|url=http://pdfbox.apache.org/|url-status=live|title=The Apache PDFBox project- Apache PDFBox 3.0.0 released|date=August 17, 2023|archive-date=January 7, 2023|archive-url=https://web.archive.org/web/20230107234923/https://pdfbox.apache.org/}} Updated for new releases.</ref>
Line 289 ⟶ 318:
[[Raster image processor]]s (RIPs) are used to convert PDF files into a [[raster graphics|raster format]] suitable for imaging onto paper and other media in printers, digital production presses and [[prepress]] in a process known as [[rasterization]]. RIPs capable of processing PDF directly include the Adobe PDF Print Engine<ref>{{cite web|url=https://www.adobe.com/products/pdfprintengine/overview.html|title=Adobe PDF Print Engine|publisher=Adobe Systems Inc.|access-date=August 20, 2014|archive-date=August 22, 2013|archive-url=https://web.archive.org/web/20130822034446/http://www.adobe.com/products/pdfprintengine/overview.html|url-status=live}}</ref> from Adobe Systems and Jaws<ref>{{cite web|url=http://www.globalgraphics.com/products/jaws_rip/|title=Jaws® 3.0 PDF and PostScript RIP SDK|work=globalgraphics.com|access-date=November 26, 2010|archive-date=March 5, 2016|archive-url=https://web.archive.org/web/20160305090728/http://globalgraphics.com/products/jaws_rip|url-status=dead}}</ref> and the [[Harlequin RIP]] from [[Global Graphics]].
In 1993, the Jaws raster image processor from Global Graphics became the first shipping prepress RIP that interpreted PDF natively without conversion to another format. The company released an upgrade to
[[Agfa-Gevaert]] introduced and shipped Apogee, the first prepress workflow system based on PDF, in 1997.
Line 301 ⟶ 330:
=== Native display model ===
{{unreferenced section|date=November 2023}}
PDF was selected as the "native" [[metafile]] format for [[macOS]] (originally called Mac OS X), replacing the [[PICT]] format of the earlier [[classic Mac OS]]. The imaging model of the [[Quartz (graphics layer)|Quartz]] graphics layer is based on the model common to [[Display PostScript]] and PDF, leading to the nickname ''Display PDF''. The [[Preview (macOS)|Preview]] application can display PDF files, as can version 2.0 and later of the [[Safari (web browser)|Safari]] web browser. System-level support for PDF allows
=== Annotation ===
Line 310 ⟶ 339:
There are also [[web annotation]] systems that support annotation in pdf and other document formats. In cases where PDFs are expected to have all of the functionality of paper documents, ink annotation is required.
=== Conversion and Information Extraction ===
PDF's emphasis on preserving the visual appearance of documents across different software and hardware platforms poses challenges to the conversion of PDF documents to other [[file format]]s and the targeted [[Information extraction|extraction of information]], such as text, images, tables, [[Bibliography|bibliographic information]], and document [[metadata]]. Numerous tools and source code libraries support these tasks. Several labeled [[dataset]]s to test PDF conversion and information extraction tools exist and have been used for benchmark evaluations of the tool's performance.<ref>{{Citation |last=Meuschke |first=Norman |title=A Benchmark of PDF Information Extraction Tools Using a Multi-task and Multi-domain Evaluation Framework for Academic Documents |date=2023 |work=Information for a Better World: Normality, Virtuality, Physicality, Inclusivity |volume=13972 |pages=383–405 |editor-last=Sserwanga |editor-first=Isaac |url=https://link.springer.com/10.1007/978-3-031-28032-0_31 |place=Cham |publisher=Springer Nature Switzerland |language=en |doi=10.1007/978-3-031-28032-0_31 |isbn=978-3-031-28031-3 |last2=Jagdale |first2=Apurva |last3=Spinde |first3=Timo |last4=Mitrović |first4=Jelena |last5=Gipp |first5=Bela |editor2-last=Goulding |editor2-first=Anne |editor3-last=Moulaison-Sandy |editor3-first=Heather |editor4-last=Du |editor4-first=Jia Tina|arxiv=2303.09957 }}</ref>
== Alternatives ==
Line 318 ⟶ 350:
[[MODCA|Mixed Object: Document Content Architecture]] is a competing format. MO:DCA-P is a part of [[Advanced Function Presentation]].
== See also ==
* [[Web page]]
Line 329 ⟶ 362:
== Further reading ==
* {{cite
* PDF 2.0 {{cite web |url = https://www.iso.org/standard/75839.html |title=ISO 32000-2:2020(en), Document management — Portable document format — Part 2: PDF 2.0 |website = International Organization for Standardization |language = English |access-date = December 16, 2020 }}
* PDF 2.0 {{cite web |url = https://www.iso.org/standard/63534.html |title=ISO 32000-2:2017(en), Document management — Portable document format — Part 2: PDF 2.0 |website = International Organization for Standardization |date=August 3, 2017 |language = English |access-date = January 31, 2019 }}
|