Remediating Inaccessible Static PDF Documents
Automated Accessibility Checking
Many pieces of software contain built-in accessibility checkers that check for various issues, including images not having image descriptions, a lack of tags, color contrast issues, and other problems. Before diving into anything else, run the checker to see what issues it finds. Keep in mind, though, that these automated tools may not catch all the problems, so manual testing throughout the process of remediation is still required. You might also want to periodically run the checker again throughout the remediation process. If the checker finds issues with color contrast, that is a change that needs to be made in the source document.
Tagging PDF Documents
Before looking at standard tags you can apply to documents, it’s important to understand that while the way you apply these tags is software dependent, the functionality and semantics are the same. The PDF standard has a standard set of tags you can apply to content to make your document accessible.
Now that we’ve got your expectations aligned, what exactly is tagging? Tagging is the process of specifying the structure and semantic intent of the document’s content, so it can be properly interpreted and communicated to assistive technology.
The Root of the Document
When tagging a document, we need to have a root for what is called the structure tree. The root of this tree must always be the “document” tag. This tag will contain all other content tags for the document, including paragraphs, headings and images.
Headings
Headings play a crucial role in logically structuring documents. When reviewing documents, screen reader users often prefer to quickly go through documents and find relevant sections by headings. To allow this in a PDF document, you need to use heading tags. There are 6 different heading tags that each represent a different section level in the document’s hierarchical structure. Not all documents use all six. The important thing is to order them logically and ideally not skip a level. Without properly tagged headings, screen readers will see an unstructured block of text.
-
Heading Level 1, Heading 1 or “h1”
- Like all other headings, the terminology is different depending on the software being used, but the underlying semantics are the same. The “h1” tag signifies the title or main topic of the document. You can also think of it like the title of a paper or report.
-
Heading Level 2, Heading 2 or “h2”
- The “h2” tag signifies a major section topic in the document hierarchy. Think of them like outlining the main sections of a paper, such as the introduction, main body paragraphs, and the conclusion.
-
Heading Level 3, Heading 3 or “h3”
- The “h3” tag signifies a subsection under a heading level 2 in the document hierarchy. Think of them as the points that support each main point of a paper. The heading levels beyond this point can be thought of as further detailed breakdowns.
-
Heading Level 4, Heading 4 or “h4”
- The “h4” tag is a sub-subsection of an “h3” in the document hierarchy.
-
Heading Level 5, Heading 5 or “h5”
- The “h5” tag is a sub-subsection of an “h4” in the document hierarchy
-
Heading Level 6, Heading 6 or “h6”
- The “h6” tag is the lowest, most nested heading level in the document. Heading levels need to be ordered sequentially and avoid skipping levels unnecessarily. If they do, it breaks the logical structure of the document, confusing screen reader users.
Lists, Figures, Links, and Text Content
Most government documents consist of images, lists and generic text content. The tags you use for this are the following:
-
“p” or Paragraph
- This is for standard body text of a document. Only continuous body text can be inside the paragraph tag. Doing otherwise is a common source of issues.
-
Link
- The link tag represents a hyperlink to external resources and other sections of the same document, similar to links on a webpage. A link is categorized as continuous body text content, so it can be nested inside a paragraph. When crafting the text for the link, make sure it contains enough detail to be meaningful in describing the destination of the link. As an example, link text such as “here”, “click here” and “learn more” are insufficient because they don’t properly describe the link’s destination.
-
Figure
- The figure tag is a graphic in the document. When you tag an image with it, be sure to include an image description, also known as alternative text or alt text. When authoring an image description, make sure it is meaningful, describing the important parts of the image. If the image contains text and the document does not contain the text itself in the body of the document, the image description should be the text from the image word-for-word. If the text in the image is in the body of the document, consider marking it as decorative by classifying it as an artifact. Also avoid phrases such as “A photo of” or “A portrait of” because a screen reader user is already told that the element is a graphic. The use of such phrases creates auditory clutter and redundant information.
-
“artifact”
- This is not technically a tag. It is more of a classification. This classification is used to mark an image as decorative, or to mark content that can be safely ignored by assistive technology
-
“l” (lowercase “L”) or list
- The “l” tag is the main container for lists. There is no way to specify whether the list is a numbered or bulleted list. Use this instead of relying on text to mimic a list.
-
“li” or List Item
- The “li” tag is the container for items in a list. They are supposed to be a direct child of the “l” tag.
-
“lbl” or label
- The “lbl” tag represents the list marker of a list item. It is a direct child of the “li” tag. Just like with the “l” tag, you can’t specify if the marker is a number or a bullet list marker. That is determined by the content of the “lbl” tag.
-
“lbody” or list body
- The “lbody” tag is the actual content of the list item. It is, just like the “lbl” tag, supposed to be a direct child of the “li” tag.
Tables
Use tables when your document presents tabular data. When tagging tables in your document, there are a few tags that you’ll be using:
-
“table”
- This is the root tag of a table structure in the document
-
“tr” or Table Row
- This is a structural tag used in the construction of a table. The table row tag is meant to be a direct child of the “table” tag. This tag can contain a tag for normal data and for headers.
-
“td” or Table Data
- This is a structural tag representing a single datapoint in a table in relation to a column or row in the table. This tag is intended to be a direct child of a table row.
-
“th” or Table Header
- This is a structural tag for constructing tables. Table headers can be scoped to a column or row. Table headers communicate to a screen reader user what kind of data a row or column contains
-
“caption”
- This tag represents a caption associated with a table or a figure
Reading Order
Now that we understand tagging and the various tags we can apply to a PDF document, the next crucial concept to understand is the reading order of the document. Even if the content appears visually correct, that may not be the case after applying tags to your document. After tagging, you must ensure that the actual reading order matches the intended reading order and remains logically consistent. Software used to apply tags to PDF documents often provides the ability to view and modify the reading order. However, adjusting the reading order in software isn’t enough. It also must be tested with assistive technology, such as screen readers.