4. Document Type Definitions

(Note: to keep the explanation simple, most of this section is going to tell some lies, mainly by omitting a lot of history. Truthfulness will be fully restored in a following section.)

DocBook is a structural-level markup language. Specifically, it is a dialect of XML. A DocBook document is a hunk of XML that uses XML tags for structural markup.

In order for a document formatter to apply a stylesheet to your document and make it look good, it needs to know things about the overall structure of your document. For example, it needs to know that a book manuscript normally consists of front matter, a sequence of chapters, and back matter in order to physically format chapter headers properly. In order for it to know this sort of thing, you need to give it a Document Type Definition or DTD. The DTD tells your formatter what sorts of elements can be in the document structure, and in what orders they can appear.

What we mean by calling DocBook an `application' of XML is actually that DocBook is a DTD — a rather large DTD, with somewhere around 400 tags in it.

Lurking behind DocBook is a kind of program called a validating parser.When you format a DocBook document, the first step is to pass it through a validating parser (the front end of the DocBook formatter). This program checks your document against the DocBook DTD to make sure you aren't breaking any of the DTD's structural rules (otherwise the back end of the formatter, the part that applies your style sheet, might become quite confused).

The validating parser will either bomb out, giving you error messages about places where the document structure is broken, or translate the document into a stream of formatting events which the parser back end combines with the information in your stylesheet to produce formatted output

Here is a diagram of the whole process:

The part of the diagram inside the dotted box is your formatting software, or toolchain. Besides the obvious and visible input to the formatter (the document source) you'll need to keep the two `hidden' inputs of the formatter (DTD and stylesheet) in mind to understand what follows.