What is a PDF?

What is a PDF?

We all take for granted that everyone knows what a PDF is. But there is a lot more to a PDF than many people realize. PDF stands for “Portable Document Format.” In its most basic form, a PDF is a standardized computer data file format used for exchanging information with others. Because the file format is based on a published standard, it theoretically makes it possible that anyone can open any PDF document as long as the PDF document follows the rules outlined by the standard. The PDF standard is normally referred to as the “PDF specification.” It evolved from a project at Adobe Systems Corporation whose goal was to create a standard format for exchanging documents across diverse computer platforms. Originally, the PDF specification was proprietary to and owned by Adobe, but in later years they released information to the public and relinquished their rights of ownership. Eventually, the International Organization for Standardization (IS0) published a standard for it. The specification has been enhanced over the years to include new features. The first version of the PDF specification was given a 1.0 designation. Subsequent versions of the specification have been developed and named using a version increment of .1 for minor revisions or 1 for major revisions. The current PDF specification version is 1.7. Adobe has complicated the standard by adding its own proprietary extensions with versions numbered using a “Level” numbering scheme. The PDF specifications have been designed so that each version is compatible with all of the previous versions. So PDF documents created using the 1.0 specification can still be displayed by software that implements the 1.7 specification. But the reverse is not true.

PDF documents can include text, images, and shapes such as lines, circles, or squares, as well as other information. It is best to think of a PDF document as consisting of a set of rules which when followed produce a desired result on the screen or when printed on a piece of paper. The rules might specify that a word be placed at a particular position on a page of a certain size (maybe 8.5 inches by 11 inches), and that the word be underlined and colored in a bright shade of red. Subsequent rules might specify that other information be placed on a page at a specific position such as a large circle with a circumference of four inches being added centered at the exact center of the page. Further rules might then specify that the page be reduced in size and rotated by 90 degrees. Although the actual details involved with describing the page is somewhat more complex than this example, it is easy to understand how a set of rules could be used to describe a PDF page. Some of the intricacies of the PDF specification make it possible to avoid the duplication of data within a PDF. If a picture is displayed more than once, it may be added to the PDF only once and then referenced each time it is used.

You may be asking, “If the PDF format is so perfect for exchanging documents, why aren’t all documents just stored in PDF format in the first place?” The reason is that many documents involve a lot more components than the visible portion that people want to exchange with others. Think of a spreadsheet for example. The spreadsheet contains columns of cells with numbers and text, but there are also formulas within some of the cells. The person who receives a monthly budget report may not need to know how each cell was calculated. They just want to see the budget dollars and categories. Saving the spreadsheet in PDF format eliminates the underlying formulas and other complexities such as macros. The PDF that is produced from the spreadsheet then basically becomes a snapshot of the data contained in the spreadsheet at a moment in time. The user who receives the spreadsheet has no ability to change the budget categories and numbers since they don’t have the original spreadsheet. It may be possible however to edit the PDF directly using a program such as Adobe Acrobat. Which brings up another aspect of PDFs – security. PDFs may be secured so that certain operations, such as modification of the document, are prohibited and that a password is required. PDFs may be constructed in such a way as to prevent modification, copying contents, adding comments, printing the document, adding a digital signature, or other operations. Open a PDF using Adobe Acrobat and display the document security under the document properties yourself to begin becoming familiar with the security features that are available in PDFs.

One advanced feature of PDFs is the ability to embed other types of information within a PDF. For example, a video could be embedded within a PDF or even something like a spreadsheet, an email, or even another PDF. For this reason, it is possible to include not only information that others need to see such as a budget report but also the information that was used to create the visible document. Adobe Acrobat refers to PDF documents that contain other documents within them as “PDF Portfolio” documents. Other programs may simply show that a PDF contains an “attachment” and not use the Portfolio terminology. It is also possible to embed complex data in PDF documents such as XML files. This ability makes it possible for PDFs to be used to power complex business processes that take place between companies. One example would be the use of an embedded XML file within a PDF invoice. Not all programs that can display PDF documents will be able to display or extract the files that have been embedded within them.

The PDF document specification provides a useful means for storing and exchanging information in a standard format that everyone can access. Future articles will focus on additional advanced features of PDFs such as PDF forms and special PDF archival formats such as PDF/A.

Leave a Comment