DITA

DITA—What's all the Fuss About?

in

Slideshow — click here.

Intro

As technical writers, what is our job?

To enable the reader find the information he needs as quickly and easily as possible so he can get on with his real work.

As writers, we have to implement it in as short a time as possible, with as few resources as possible.

That is what DITA can help us to do.

Now, before we really get started, a word about questions. If you have a question, please jump right in and ask. I’d prefer you to do that than wait until the end, then we have to go back to something we talked about an hour previously.

Current Tools

First, I’d like to just get an idea of the tools you guys are currently using. I’m guessing most of you are using Frame. [Ask for show of hands for Frame and Word.]

As we go on, I’ll draw some parallels between the way you do things now in Frame (and maybe Word) and they way you’d do the same things in DITA.

What is DITA?

DITA is an XML dialect. So let’s take a step backwards and ask the question “What is XML?”

XML is the extensible markup language. When you look at it, it looks a lot like HTML. For example, HTML has a <body> element, <p> elements for paragraphs, <h1>, <h2>, <h3>, and so on for headings, <a> for links, and so on.

It has elements which are demoted by words in angle brackets. [SLIDE] Elements have opening and closing tags. [SLIDE] Or they are self-closing. These elements can contain other elements [SLIDE]. They can also have attributes [SLIDE].

It is most commonly used as a meta-language. What does this mean? It means that it is a language for defining other languages. A DTD (Document Type Definition) or XML schema sets out rules for the elements that can be used, which elements can be nested within other elements, which elements are mandatory and which are optional, mandatory and optional attributes, and so on.

Before DITA, there were already several ways you could write documentation in XML:

  • Docbook is quite popular, but it has a huge tag set and is book-oriented. It is good for producing large, monolithic documents.
  • S1000D is used primarily in aerospace and military settings (where in many cases, it is mandatory, I believe). It also has a very large tag set.
  • Roll-your-own — some companies have written their own XML dialect for documentation. This is very expensive and time-consuming.

DITA is another such language, or XML dialect. XHTML is another. (It is HTML implemented in XML.) So each of the DITA DTDs (there are several—we will get to this in a second) sets out the rules for which elements are allowed where, attributes, etc.

I mentioned that there are several DITA DTDs. There are five main ones, in fact. There is one base DTD called topic [SLIDE]. This is generally not used in practice. It is just used as an abstract information type, on which the other three types are based.

[SLIDE] The other three topics are concept, task, and reference. Each if these is actually a specialization of the topic type. We will talk about specialization a bit more later on, but all we need to know about it right now is that it lets you create new types of topic from existing ones.

So topic lets you use any DITA element, while the task type, for example, is much more rigid. It only lets you use a subset of elements. Just ones that make sense in a topic that explains a procedure.

The basic idea is that you write self-contained topics that do one of the following: explain what something is or how it works (concept). Explain how to do something (a task). List reference information, like you would typically find in a document’s appendixes (reference).

Creating topics to be self-contained means that they can be re-used in different contexts [SLIDE]. For example, you could have a concept topic that explains what a particular feature does that you could use in both a user guide and an administrator guide. Or in the user guides for two different versions of a product.

[SLIDE] There is a fifth type of file called a DITA map. This collects topics together for a particular deliverable. It also establishes how the topics are organized into a hierarchy.

It’s a bit like a Framemaker book file, except that it lets you nest topics one inside the other instead of just being a list.

And at the end of it all, you can generate a variety of deliverables from your DITA source files: XHTML, HTML Help (CHM), PDF, EclipseHelp, and so on.

Benefits

[SLIDE]

So now we have some sort of an idea of what DITA is, let’s take a look at the benefits. Why are more and more organizations switching from Frame or Word to a DITA-based system? And why is it good for us writers as well?

I have split the benefits up into business benefits and client benefits, but as you will see, some of the things that are good for business make our lives easier too.

Business Benefits

  • Content can be reused easily. This saves time and money.
  • Content is worth more, as it can be used in more ways.
  • It is tool-agnostic. There is no vendor lock-in. Switching authoring tools and content management systems is relatively easy.
  • [SLIDE] DITA is modular . This means cheaper, faster translation:
    • Because it is modular, you can send the completed topics for translation while you continue working on the rest, instead of having to wait until the whole thing is finished.
    • When the documentation is updated, you only have to send the updated topics to be translated.
  • The generation of deliverables can be automated. This saves time (and therefore money).
  • Topic (and more granular) reuse means that content only has to be updated in one place. With the old copy/paste way of doing things, you have to try to remember where content is reused or search for it, and hope you do not miss anything.
  • The inflexible structure of topics means that the content has a consistent structure. So the reader can learn what to expect. This makes finding things easier.
  • Improved information completeness — as a writer, you know you will most likely have to write the following for each feature:
    • At least one conceptual topic
    • At least one task topic
    • At least one reference topic
  • [SLIDE] Specialization means that if out-of-the-box DITA does not suit your needs, you can customize it so that it does. This is much cheaper than starting from scratch. There may even be an existing specialization that fits the bill. For example, there is an existing specialization for API documentation.
  • Faster implementation time — DITA can be used out-of-the-box. And as I already mentioned, if you need to specialize, it is much easier, faster, and cheaper than starting from scratch.
  • Because DITA is a standard, it means that you can exchange information easily. For example, let’s say that in a large corporation, one department wants to take another department’s product and integrate it into their own product. If both departments are using DITA, they can integrate the documentation much more easily than they could otherwise.
  • Because documentation is broken down into small units, it means that if you are working in a team of writers, they can work on different topics at the same time. This can increase productivity and reduce delivery times.
  • Compared to other XML dialects, such as DocBook or S1000D, the tag set is quite small. This means that there is less for writers to learn, so they can get up to speed quickly.
  • The cost of entry is low. There are lots of different tools that you can use for authoring. Some are free. The open-source DITA Open Toolkit was donated by IBM and can convert from DITA XML files to XHTML, HTML Help, EclipseHelp, PDF, and more. It has an extensible architecture that allows plugins to be written to support new output types. For example, the supplied PDF transformation uses an open-source tool for creating PDFs. This tool is not very good. But there is a free plugin that uses a commercial converter that does a much better job. The Open Toolkit is used by many authoring tools and some content management systems.
  • DITA is becoming a standard. If a client chooses DITA for a new project, they will save the cost of converting from a legacy format further down the road.

Benefits for the Writer

  • Because there is separation of content from presentation, it means you save time. You don’t have to futz around with fonts and sizes, or with space before and after paragraphs. You don’t have to go through documents to check for headings with only one short paragraph following them and add page breaks. You don’t have to massage tables to get them to fit. (You can have hours of fun with this in Word…)
  • The fact that you can reuse almost anything in DITA offers a number of benefits as compared to the way we are used to doing things. In Frame, if we want to reuse something smaller than a chapter, we have to use text insets. These have a number of limitations. (If you know of any workarounds, I’d be happy to hear them.):
    • Searching — if you are searching in a chapter file that contains text insets, Frame won’t search inside the insets.
    • Cross-references — if you have a cross-reference inside a text inset, when you save as a PDF, the cross-references are not converted into links. Sure, you can use Framescript to convert the insets to text before you convert (and then to close without saving), but it is a pain.
    • Contextual formatting — let’s say you’ve got a bulleted list that you want to reuse somewhere else. In its original context [SLIDE], it is a nested list, with a bigger indent than the “parent” list. But in the place where you want to reuse it, you don’t want it to be nested inside another list. If you use an inset, your list will have the same indent as the original [SLIDE]. OK, you could use different definitions of the same paragraph style, but this makes things kind of messy.
  • Less thinking required — the strict topic structure guides you. This means you can concentrate on writing rather than on thinking about the topic structure.

Who Is Using DITA?

Among others [SLIDE]:

  • Adobe
  • Autodesk (the AutoCAD people)
  • Freescale
  • IBM (of course)
  • GE
  • Schlumberger
  • Boeing
  • Lucent
  • Nokia
  • Sun
  • HP
  • Sybase
  • Oracle

Writing in DITA

Now to the meat of the presentation. [SLIDE] How do you write in DITA?

Before we start, a word about authoring tools. There are lots of different tools that you can use to write in DITA. You can use a simple text editor. That’s not much fun, though. You can use Frame 8. It has built-in support for DITA. XMetal has a DITA version of their authoring tool. And so on and so on.

I use an editor called the XMLMind XML Editor (or “XXE”), for the simple reason that at the time, it was free, even for commercial use, and quite full-featured and easy to use. Now it costs $250, which is still a lot cheaper than most of the other tools.

Anyway, the point is that here I am not going to concentrate on how you write in DITA using a specific authoring application. Instead, I will talk about when and where to use the various elements in each of the topic types, how to create links, how to reuse content, etc. Exactly how you do this will be a little different in each tool, but the principles are the same.

(The screenshots in some of the examples that I will show you are from XXE.)

Topic Types

Topic

[SLIDE]

Don’t use it. The only time you might need to use it is if you need to lump two or more different types of topic together into one topic. Maybe the client wants a single page for a certain feature that includes a concept topic and a task topic.

The only other time you might want to use it is as a storage container for content references (or “conrefs”). These are bits of content that are smaller than a topic that you can reuse in multiple locations. We will get to this a bit later on.

Concept

[SLIDE]

According to the DITA Language Specification:

The <concept> element is the top-level element for a topic that answers the question ‘what is?’ Concepts provide background information that users must know before they can successfully work with a product or interface. Often, a concept is an extended definition of a major abstraction such as a process or function. It might also have an example or a graphic, but generally the structure of a concept is fairly simple.

So what can we put in a concept topic? Or maybe we should start this from a different angle… Let’s say we’ve identified the need for a concept topic, and we know it will have some sub-sections. In Frame or Word you might have a chapter that explains a particular feature, with a bunch of different heading levels with content under each one.

Obviously, this doesn’t apply just to concept-type topic. The same is true for the other types too.

In DITA, there are several ways we can do this:

  • We can write one topic containing multiple sections [SLIDE]. The disadvantage of this is that in DITA, the <section> element cannot be nested — a <section> can’t contain another <section>. So this might be OK in some situations, but not in others.
  • We can write each sub-section as a separate topic and then arrange them as we want in the map [SLIDE]. This works well and there is no limit to how deep you can nest the topics. But, if it is important to have certain sub-topics on the same actual page, rather than separated into separate pages, this approach is not so good. (This only applies to HTML deliverables. In a PDF, they will all be presented one after the other anyway.)
  • We can nest topics one inside the other [SLIDE]. This lets us nest the topics as deep as we like, and they all end up on the same page (in HTML). But it doesn’t make reuse so easy. If we want to reuse one of the nested topics in a different context in a different manual, it’s much easier if that topic is in a separate file of its own.
  • We can write each sub-section as a separate topic and use the content reference mechanism (“conrefs”) to include them in the parent topic file [SLIDE]. This means we can reuse the sub-topics easily (as they are in separate files), but we can also make sure that they appear on the same page (if that’s what we want). They can also be nested to any depth.

In practice, you will probably find that you will end up using a combination of these approaches. Most of the time, I use separate topics organized in a map. But occasionally, I use the conref mechanism to nest topics inside a parent topic. So far, I have never found the need for more than one level of nesting in the same page. I may have many levels in my map, but in a topic that is displayed on a single HTML page, I never have more than one nesting level.

What Can <concept> Contain?

Now, in each of these sections, I’m going to explain the most important elements. There are others, which you can find in the DITA Language Specification. (At the end, I will give you the URL of the online version of this presentation. The Resources section at the end has a link to DitaInfocenter, which includes the Language Specification and lots of other useful stuff.)

First off, the <concept> element itself. [SLIDE] If a concept topic lives in its own file, you must assign an ID to the <concept> element.

[SLIDE] The <concept> element can contain:

  • <title> (mandatory) — the topic’s title
  • <shortdesc> — [SLIDE] a short description of the topic. I try to make a habit of always writing a shortdesc for each topic. And to try to give a little more information about the topic without just repeating the title. This is because the shortdesc is used to help the reader decide if he the link he’s about to click is actually going to help him find what he wants, or whether he should keep looking. In HTML output, the shortdesc is used like this:
    • [SLIDE] When you have nested topics in a map, the parent topic contains a list of links to its child topics, including each one’s shortdesc.
    • [SLIDE] In a topic’s related links section, if you hover over a link, a tooltip appears, containing the topic’s shortdesc.
  • <conbody> — this contains the actual content of the topic.

Note that <conbody> is not mandatory. “OK, what’s the point of that?”, you might be thinking.

Well, like we saw a minute ago [SLIDE], you can have a topic that is just the parent for other topics. The alternative to this approach is to use a <topichead> element in your map that contains the <topicref>s that point to the child topics.

This will give you an item in the TOC that isn’t actually a link. But this also means that it cannot have a shortdesc. So if you want to be consistent and have shortdescs for everything, you will need title-only topic (that is, one with no content).

<conbody> can contain the following (again, not an exhaustive list):

  • <p> — [SLIDE] a paragraph (which can contain various inline elements, more on that later)
  • <note> — [SLIDE] a note (note, tip, caution, danger, etc.). (The type is indicated in the type attribute.)
  • <dl> — [SLIDE] a definition list. This is useful for things like lists of concepts [SLIDE] or for a list columns in a database table and their meaning. Optionally, you can add headings for the terms and the descriptions.
  • <parml> — a parameter list. This is used in things like API manuals for listing the parameters that a function takes and what they mean. It looks very similar to a <dl>.
  • <ul> — [SLIDE, SLIDE] an unordered list (i.e., a bulleted list). Note that these can be nested. (A <ul> can’t contain a <ul>, but its child element, <li> can.)
  • <ol> — [SLIDE, SLIDE] an ordered list (i.e., a numbered list). Like <ul>, these can be nested.
  • <sl> — a simple list (no bullets or numbers)
  • <pre> — [SLIDE, SLIDE] “preformatted” — preserves line breaks and spacing. It’s usually formatted with a monospaced font.
  • <codeblock> — for lines of program code. It is handled and formatted like <pre>.
  • <msgblock> — for messages generated by an application (such as a command-line application). Again, it is similar to <pre>.
  • <screen> — [SLIDE, SLIDE] for representing a text-based interface, like you sometimes see in Unix or Linux systems.
  • <fig> — [SLIDE, SLIDE] a figure with a caption. This usually consists of a title and an image, but it can also include codeblocks, dls, and so on.
  • <image> — [SLIDE, SLIDE] an image without a caption. It can be either inline with text or not, and supports the most common image types (PNG, GIF, JPG, etc.) There is also a plugin for the Open Toolkit that lets you use SVG images (Scalable Vector Graphics). What’s nice about these is that they look great in PDFs (because they remain as vector graphics) and the plugin can convert them to PNGs for HTML output.
  • <syntaxdiagram> — [SLIDE] a syntax diagram. It is used for representing a statement from a programming language, and it has child elements for representing the various parts of such a statement.
  • <imagemap> — [SLIDE] like an HTML imagemap, where the image is divided into regions, each of which links to a different topic or URL.
  • <object> — [SLIDE] this lets you embed multimedia objects, like video or Flash.
  • <table> — [SLIDE, SLIDE] a table with a caption. Use this type of table if you need to complex things like spanning.
  • <simpletable> — [SLIDE, SLIDE] a simple table without a caption. This does not support spanning and other complicated stuff.
  • <section> — [SLIDE, SLIDE] a sub-section. The Language Specification states that “Multiple sections within a single topic do not represent a hierarchy, but rather peer divisions of that topic.” They cannot be nested. They usually have a title, but it is not mandatory. A section can contain most of the elements that we have already covered.
  • <example> — [SLIDE, SLIDE] an example. It can contain the same child elements as <section>. (I can’t remember ever having used an example in a concept topic. But I guess there must be situations where you would need it.)

An aside:

We just saw that some elements are very similar. <pre> and <codeblock>, for example. In the final output, they look exactly the same. So why bother having both?

It’s for semantic markup. We are assigning meaning to our content. Let’s take a simple example. <b> and <uicontrol> are both rendered as bold text in the output. So why not just use <b> for buttons, menu items, etc.?

What happens if the client asks us to use a different font for buttons and menu items in the documentation? If we’d used <uicontrol>, our HTML would have hooks in it for modifying the way that <uicontrol> elements are displayed. We can add one line to our CSS stylesheet and be done with it. (For PDF, we can dig in the XSL to do something similar.)

But if we used <b>, we’ll have to start searching for every instance of <b> and change the ones that represent parts of the user interface into <uicontrol> elements instead.

It’s better to use the most appropriate element from the start.

Task

[SLIDE]

According to the Language Specification:

The <task> element is the top-level element for a task topic. Tasks are the main building blocks for task-oriented user assistance. They generally provide step-by-step instructions that will enable a user to perform a task. A task answers the question of “how to?” by telling the user precisely what to do and the order in which to do it. Tasks have the same high-level structure as other topics, with a title, short description and body.

What Can <task> Contain?

[SLIDE] Just like with <concept>, the <task> element must have an ID if it lives in its own file.

[SLIDE] And in the same way that <concept> contains a <title>, a <shortdesc>, and a <conbody>, <task> contains a <title>, a <shortdesc>, and a <taskbody>.

Just like with <concept>, <taskbody> is not mandatory. Remember we talked about having a title-only topic with other topics nested within it in the map file?

Well, with tasks, we can do the same thing. And we can do something even more clever. If we add an attribute the parent topic in the map, instead of just a list of links, this list of links will be numbered. And in each individual topic, we will get next and previous links. This is handy for when we have a number of procedures that must be performed in order, but we do not want to put them all in one topic.

<topicbody> can contain:

  • <prereq> — [SLIDE, SLIDE] prerequisites, all the things that the reader must know or do before performing this procedure
  • <context> — [SLIDE, SLIDE] background information for the task — what the task is for, what the reader will gain by performing the task
  • <steps> or <steps-unordered> — [SLIDE, SLIDE] the actual steps of the task (more about this in a minute). The language spec says you should use <steps-unordered> when “the order of steps may vary from one situation to another”. I only use it for procedures with a single step. [SLIDE]
  • <result> — [SLIDE, SLIDE] the result of performing the procedure. This lets the reader know whether they have performed the procedure correctly.
  • <example> — [SLIDE, SLIDE] an example. You can only have one <example> element per task. But there are lots of elements that you can use inside <example>, so if you really need to include more than one example, you can.
  • <postreq> — [SLIDE] things the user must do after performing the procedure.

Note that all of these elements are optional.

Now, the <steps> element. The <steps> element contains one or more <step> elements.

[SLIDE, SLIDE] Each <step> contains:

  • <cmd> (mandatory) — a command. The action that the reader must perform.
  • <info> — additional information about the step. This is useful for adding notes. <info> is pretty flexible. There are lots of elements you can put in there. You can also use <info> to cheat. For example, sometimes I have to document steps like “add the following block to such-and-such configuration file”. The ideal element for this multi-line block is a <pre>. But <pre> isn’t allowed in <cmd>. But it is allowed in <info>. So that’s where I put it.
  • <substeps> — sub-steps. The content model is exactly the same as for <steps>, that is, each <substep> contains <cmd>, <info>, <stepxmp>, and <stepresult>. The only difference is that sub-steps can’t have sub-steps. You just have two levels. That’s it. The logic is that if you need more levels, you should think about breaking your one big, complicated procedure down into multiple smaller, simpler procedures.
  • <choices> — [SLIDE, SLIDE] where the reader has to do one thing from a list of possible things. This is rendered as a bulleted list.
  • <choicetable> — like choices, but in the form of a table instead of a list. It has two columns — option and description. The table can have a heading row.
  • <stepxmp> — an example for the step. Only one is allowed.
  • <stepresult> — the result of performing this step. The helps to reassure the user that they are on-track.

Note that except for <cmd>, all these elements are optional. They can also come in any order, except for <stepresult>, which has to come at the end.

You should also note that if a step is optional, you can set the importance attribute to optional for the <step> element.

That’s it for task.

Reference

[SLIDE]

According to the Language Specification:

The <reference> element defines a top-level container for a reference topic. Reference topics document programming constructs or facts about a product. Examples of reference topics include language elements, class descriptions, commands, functions, statements, protocols, types, declarators, operands, and API information, which provide quick access to facts, but no explanation of concepts or procedures. Reference topics have the same high-level structure as any other topic type, with a title, short description, and body. Within the body, reference topics are typically organized into one or more sections, property lists, and tables. The reference topic type provides general rules that apply to all kinds of reference information, using elements like for syntax or signatures, and for lists of properties and values.

What does this mean? It means that it’s for the kind of stuff we usually put in appendixes.

What Can <reference> Contain?

[SLIDE] Like the other topic types, it has a <title>, then a <shortdesc>, then a <refbody>. And here too, <refbody> is optional.

<refbody> can contain one or more of the following, in any order:

  • <section> — [SLIDE] a section can contain just about anything you like, except other sections. So it’s pretty flexible.
  • <refsyn> — [SLIDE] this is intended for documenting APIs and commands that you would type at the command line.
  • <example> — [SLIDE] an example
  • <table> — [SLIDE] a table, with a caption
  • <simpletable> — [SLIDE] a simple table, with no caption
  • <properties> — [SLIDE] a special kind of table, with three columns — type, value, and description. (All these are optional.) This is more limited in application than a regular table, but it does have one very useful feature. For a particular type, you can have more than one value (and description, if needed). Doing this in a regular table is complicated. But generally, I tend to avoid <properties> and use <simpletable> instead. Why? Because <properties> can only be used in reference topics. If you want to reuse the whole table or just bits of it in a task topic, for example, you can’t. With tables, you can.

That’s pretty much it for the three topic types. Now let’s move on to talk about inline elements.

Inline Elements

[SLIDE]

An inline element is one that can be used inside a paragraph-type element. For example, in a <p>, you can have a word or phrase within a <b> element. This word will be displayed in bold.

There are several different categories of inline elements. Some of them are only used within certain specialized elements (for example, in a <syntaxdiagram>, so I won’t waste your time by describing every single one. I’ll just cover the more commonly-used ones.

The first is typographic elements [SLIDE]. You should only use these if there isn’t a more semantically specialized element that you can use. This is what we were talking about with <uicontrol> and <b>, if you remember. So try to avoid using them.

  • <b> — bold
  • <i> — italic
  • <u> — underline
  • <tt> — “teletype” (monospace)
  • <sup> — superscript
  • <sub> — subscript

Programming elements:

[SLIDE]

  • <synph> — a syntax phrase. It’s a bit like <syntaxdiagram>, but less complex, and for inline use. Some of the elements that it can contain can be used on their own, others cannot. See the Language Specification. It can contain:
    • Text
    • <codeph> — a code phrase — a snippet of code
    • <option> — a command option
    • <parmname> — a parameter name
    • <var> — a variable
    • <kwd> — a keyword
    • <oper> — an operator (e.g., +, -, &&)
    • <delim> — a delimiter
    • <sep> — a separator
    • <apiname> — an API name (such as the name of a class or method)
  • <synph> — yes, it can be nested.

Software elements (These seem to be mostly concerned with command-line programs…):

[SLIDE]

  • <msgph> — a message phrase. It contains the text output from a program. It can include some of the following
  • <msgnum> — a message number
  • <cmdname> — the name of a command
  • <varname> — a variable
  • <filepath> — a file path
  • <userinput> — text that is typed by the user
  • <systemoutput> — any kind of computer output

User interface elements:

[SLIDE]

  • <uicontrol> — used for names of buttons, menu items, text boxes, etc
  • <wintitle> — used for names of windows, dialog boxes, panes, etc.
  • <shortcut> — a keyboard shortcut (i.e., a single key). It is often used to denote the key that you press together with Alt to activate a menu item.
  • <menucascade> — [SLIDE] the series of menu items that have to be selected in turn to get to the item you want. For example, in Word, to insert an image, you have to do Insert > Picture > From File…. In DITA, you would represent this as <menucascade><uicontrol>Insert</uicontrol><uicontrol>Picture</uicontrol><uicontrol>From File...</uicontrol></menucascade>.

Other inline elements:

  • <cite> — [SLIDE] a bibliographic citation
  • <q> — [SLIDE] content that is quoted from another source
  • <tm> — [SLIDE] a trademark. It’s tmtype attribute indicates whether it is a trademark, a registered trademark, or a service mark.
  • <fn> — [SLIDE] a footnote. We’ll talk about footnotes later.
  • <xref> — [SLIDE] a cross-reference. We will talk about linking in a little while , but for now, you should know that an xref can link to:
    • A different location in the same topic
    • Another topic in the same Help system
    • A location within another topic in the same Help system
    • An external source (e.g., a web page)
    • <ph> — [SLIDE] a phrase. This does not affect the way that what it contains is presented. Instead, it is used to:
    • Mark content for reuse (more about that soon)
  • Mark content for conditional processing (like conditional text in Frame)

Linking

[SLIDE]

There are a number of ways you can create links:

  • [SLIDE] You can use the <xref> element within a topic. The link appears wherever you put it.
  • [SLIDE] You can add a <related-links> element at the end of a topic. The links appear at the end of the topic.
  • [SLIDE] You can use a relationship table (<reltable>) to define how topics are related to each other. Then refer to this table in your map. Again, the links appear at the end of the topic.

So which do we use when?

Generally, the accepted best practice in this area is as follows:

<xref>

Use <xref> only when you know that the target topic will always be in the deliverable with the topic that contains the <xref>. If it isn’t, you can end up with a link that points to nothing. (I think that it will build OK, but the contents of the xref will not be a link.)

Note that if <xref> is empty, it pulls in the title of whatever it points to as the link text. So if it points to a topic or a section or a figure within a topic, you can (and probably should) leave it empty. That way, if the title changes, you’re safe.

But if it is empty and the target does not exist, then you end up with nothing in your output.

<related-links>

I never use this. It has all of the disadvantages of <xref> and none of the advantages of <reltable>. What do I mean by this? I mean that it still depends on the target topics to be in the deliverable. It depends on the context that the topic is being used in.

For example, let’s say that you have a topic that is used in both a user guide and an administrator guide.

If you add a bunch of links to topics in the user guide, then the topic won’t have any related links if you use it in the admin guide. OK, you could add more links for the admin guide. But what happens in another three months when you’re asked to produce a third guide, which will also contain this topic? You will have to go back to this topic (and every other topic that will be in the new guide) and add more links.

<reltable>

The preferred option, then, is to use relationship tables. These are usually specific to a map. So you might have a map and a relationship table for your user guide and a different map and reltable for your admin guide.

So what does a relationship table look like? How do we use it to specify how one topic links to another?

[SLIDE]

Trying to make sense of a relationship table in a text editor is a pain, as you can see. XXE has a decent reltable editor. I understand that XMetal does too.

[SLIDE] As you can see, a relationship table consists of three columns, one for each topic type.

There are various ways of specifying relationships. The simplest is the relationship between topics (topicrefs, actually) in the same row [SLIDE]. If we have one topic in each cell of a row, then we will get links from each one to the other two. Simple.

If you have more than one topic in the same cell [SLIDE], they will get links to and from the other topics in the row, but not to each other.

To specify that the topics in a cell are related [SLIDE], you have to add the collection-type="family" attribute to the <relcell> element that contains the topicrefs.

You can also specify a one-way relationship. You can add one of the following attributes to a topicref:

  • linking="targetonly"
  • linking="sourceonly"

These are fairly self-explanatory, I think.

PDF Note

By default, related links are not displayed in PDFs. You can switch them on, but the way they are displayed is not wonderful. So if you will be generating PDFs, you may want to use xrefs anyway, despite the issues we talked about.

Also, for PDF output, links usually have “on page x“ appended to them. You can turn this off if you need to, though.

Conrefs

[SLIDE]

DITA allows you to reuse content at a more granular level than topics. You can reuse pretty much anything. A table. A paragraph. An example.

You can reuse any element that is allowed in the new context. So you can reuse a paragraph (<p>) from a concept topic in the <context> element in a task. But you can’t reuse a <properties> element from a reference topic in a task, because <properties> is not allowed in a task.

Here’s a use for the <ph> element that we saw earlier. If you want to reuse something from within an element without using the whole element, let’s say, one sentence from a paragraph, you can wrap that sentence in a <ph> element and reference that.

The only other thing that you can’t do is reuse ranges of elements. For example, the first three rows of a ten-row table. The only way to do it right now is to reference each of the three rows separately. In the future, you will almost certainly be able to do this. It is being discussed at the moment. But for now you can’t

OK, on to some examples.

Let’s say you are writing the user guide for a GUI application. So you have got all kinds of screenshots, buttons, icons, and so on. You are probably going to want to use some of these on more than one place.

There are buttons on the toolbar that you might want to show in a task. “Click the <image> button to do so-and-so.” You will probably want to use the same image in a reference topic, in a table that explains what each button does.

The application may even use the same icon in a the menu option for performing the same action as the toolbar button. Maybe in a context menu too.

What do you do in Frame? You would use a single image file that you import by reference everywhere you want to use it.

In DITA, you can do the same thing. Anyway, the image file is external, so it’s not a problem. If the button changes, you just replace the old image with the new one, making sure that the filename is the same.

But what about the menu label. What about the description. What if you need to change those?

I’ll show you how I use DITA’s content reference, or conref, mechanism to do this.

In one of the applications that I am documenting, there is a function called “Add External Feeder” [SLIDE]. There is a toolbar button and a menu item that uses the same image as the button.

I need to document this function in four places. I mention it in an existing concept topic that talks about feeders. I have a task called “Adding an External Feeder”. And I have reference topics — one that explains what all the menu items do and one that explains what all the toolbar buttons do.

I have a topic that contains all the content that I know I will want to reuse. You can put all kinds of stuff in here. Notes and warnings that appear in multiple locations. Boilerplate text. Whatever.

[SLIDE] In this file I have a <simpletable>, with three columns. One column for the image. One for the menu item text, and one for the description.

Now, to be able to reuse anything, it has to have an ID attribute. Otherwise, there’s no way to find it.

So I have assigned IDs to four elements here:

  • I’ve assigned an ID to the row, because in the reference topic that explains the menus, I will be using it as-is.
  • I’ve assigned an ID to the image [SLIDE], so that I can reuse it on its own in my task topic.
  • And I’ve assigned IDs to two cells: the one that contains the image [SLIDE], and the one that contains the description [SLIDE]. This is because in the reference topic that explains the toolbar buttons, I only need the image and the description.

So here [SLIDE], in the reference topic that explains what the toolbar buttons do, each of the two cells (that is <stentry> elements) in the row references one of the cells in the table that I assigned IDs to (the image and description cells).

The syntax of the reference looks complicated, but don’t about that too much. Most editors have a way of inserting a conref without having to know this stuff.

[SLIDE] In the reference topic that explains the menu items, the table rwo references the whole row in the source table. Note that the row element still has to contain the right number of cell (<stentry>) elements, otherwise the XML is invalid.

[SLIDE] And finally, in the task topic, the image element references the <image> element in the source table. Note that the <image> element has an href attribute with a meaningless value. This is because the element itself must have this attribute. If it doesn’t, it’s not valid XML.

Conditional Processing

[SLIDE]

In Frame, you can mark content as conditional. Then you can show or hide this conditional content before you generate your deliverables.

In DITA, you can do something similar. In DITA, you do it by setting the value of an attribute on the element that you want to make conditional.

Built-in Attributes

There are a several built-in attributes for common filtering criteria:

  • [SLIDE] The audience attribute lets you specify the audience, for example, user or administrator.
  • [SLIDE] The platform attribute lets you specify the platform, for example, Windows or Mac OS.
  • [SLIDE] The product attribute lets you specify the product. So if you are producing documentation for multiple products from the same sources, you can use this to specify which ones a particular element or topic belongs to.

You can assign multiple values to these attributes as a space-delimited list.

Custom Attributes

There is another attribute called otherprops. If you have some attribute that you want to filter on other than the built-in ones, you can use otherprops.

Conditional Topics

If you want to filter in or out whole topics, there are two ways to do it:

  • You can assign values to attributes in the <topicref> that points to the topic inside the map.
  • You can use different maps for different deliverables.

OK, but when should you use each of these approaches?

Generally, I would say:

  • Use different maps for deliverables that have a significant number of non-shared topics. For example, use one map for the user guide and one for the administrator guide.
  • Use the same map and assign values to the appropriate attributes when the content is substantially the same. For example, if you have a user guide for two versions of the same product, and one version has a few features that the other hasn’t.

What Happens at Build Time?

When the time comes to build a deliverable, you can specify which values of which attributes are to be included or excluded. For example, if you are generating your user guide, you would specify that everything with audience="administrator" should be excluded and everything with audience="user" should be included. In this case anything with no value for audience is automatically included.

But what happens when an element has value for more than one attribute, and you filter on both? [SLIDE] Let’s say you have an element that has audience="user" and platform="linux". The rule that applies here is that if one is excluded, the element is excluded.

So in this example, if you are generating a user guide for the Windows version of your product, you would:

  • Include audience="user".
  • Exclude audience="administrator".
  • Include platform="windows".
  • Exclude platform="linux".
  • Exclude platform="macosx".

So our element with audience="user" and platform="linux" would be filtered out because we are excluding platform="linux". Which is the way we want it.

[SLIDE] If you’ve specified multiple values for an attribute, all the values must be excluded for the element to be excluded. For example, if an element has platform="windows macosx" and you exclude macosx and include windows, you want this element to be included. And so it is.

I haven’t mentioned specifically how you actually do the filtering when you generate your deliverables. That’s because it depends on the system you’re using. If you are using a content management system, you will probably have build profiles that you create by selecting or de-selecting check boxes that correspond to attribute values. If you are using the DITA Open Toolkit, you specify the filtering criteria in an XML file that you specify in your build file. And so on and so on.

Footnotes

[SLIDE]

You can add a footnote by simply adding an <fn> element where you want it. [SLIDE]

If you want to use the same footnote more than once, you can put the footnote somewhere else and add <xref> elements that point to it. I find myself doing this quite a bit. For example, in one of the applications that I document, you can select the columns that are displayed. [SLIDE] So in my reference topic that explains what the various columns are, I have a footnote that says Displayed by default. Then in the cell for each column that is displayed by default, I have an <xref> to the footnote. [SLIDE]

When you generate your deliverables, these are numbered automatically. If you don’t want numbers (maybe you only have one footnote and you want to use a symbol instead), [SLIDE] you can set value of the <fn>’s callout attribute to the symbol you want to use.

Comments

[SLIDE]

Often I find while I’m writing that I don’t have all the information I need. I thought I did, but as I was writing, it became obvious that a piece of the puzzle was missing.

In Word, you might select the offending word or paragraph and insert a comment. In Frame you might type in some text as your comment and mark it with the “Comment” condition.

In DITA, there are a couple of elements that fill this role:

  • The <required-cleanup> element is designed for marking content that has been imported from a non-DITA source and that needs some work. For example, it might be some legacy content that has been converted from Word, and it’s just a bunch of <p> elements with some <b> elements in there, that needs some manual work to make the markup more semantically useful. Or it might not have any tags at all, and maybe needs all the tags to be added to make it valid XML.
  • [SLIDE] The <draft-comment> element is for comments. You can insert it pretty much anywhere, with a few minor exceptions.

Most editors display these elements in some way that makes them stand out from the surrounding text.

You can also enable them to be displayed in your deliverables. This can be useful when sending things to SMEs for review.

Indexing

[SLIDE]

To indicate that an element should be indexed, add an <indexterm> element [SLIDE].

You can nest these to create multi-level indexes.

The latest version of DITA, v1.1, includes three additional indexing elements:

  • <index-see> — self-explanatory
  • <index-see-also> — self-explanatory
  • <index-sort-as> — this lets you specify a different string for sorting that how the item is displayed in the index. For example, if you have something that will appear in the index as <data> (that is, with the angle brackets), you might want this to appear under D, and not under symbols in the index. So you would add <index-sort-as>data</index-sort-as> within the <indexterm>.

Maps

[SLIDE]

We mentioned maps earlier, but there are a few more things that are worth mentioning.

[SLIDE] As we already mentioned, a map organizes topics into a hierarchy. This determines what the table of contents looks like in both HTML and PDF output.

And in the PDF, it also determines what the headings look like. For example, if a topic is nested within another topic in the map, its heading will use a smaller font size than the parent in the PDF.

We also mentioned relationship tables, and how these can live in separate files that you reference from your map file.

Well, you can do exactly the same thing with other maps. What this means is that as well as reusing individual topics, we can reuse sets of topics.

For example, you could have a map that contains all the topics that relate to a particular feature or module, which you could then reuse in maps for different products.

One other thing I think I should mention is something called a bookmap.

This is like a map, but it can also contain extra stuff that you would expect to find in printed documentation, such as:

  • <notices> — for legal and copyright stuff
  • <preface> — preface
  • <trademarklist> — list of trademarks used
  • <amendments> — list of changes

It also supports metadata for describing authors.

When you generate a PDF from a bookmap, it does a few nice additional things[SLIDE] :

  • Chapters are called chapters and have numbers.
  • Appendixes are called appendixes and have letters.
  • The first page of each chapter and appendix has a mini-TOC.

Note that in the same way that you can reference a map from within a map, so you can in a bookmap. You have to use <chapter> and <appendix>, elements, but within these you can hace <topicref> elements that point to regular maps.

[THANK EVERYONE]

[“FIN” SLIDE]

[URL SLIDE]

Resources

DITA Infocenter
DITA.xml.org
dita-users Yahoo Group

Downloads

XMLMind XML Editor (“XXE”)
DITA Open Toolkit

Note: XXE includes a minimal version of the Open Toolkit.

Comments and Questions?

Please feel free to leave comments and questions below, or contact me:

martin.polley@gmail.com
052-3864280
Skype me
MSN Messenger — sagipolley@hotmail.com

Lenya + BXE + DITA = ?

in

There’s got to be something good in this combination, if it can be made to work. I will report back as (if) I progress.

Displaying Reltable linking and collection-type Attributes in XXE

in

I have discovered that you can hack XMLMind XML Editor’s CSS stylesheet for DITA to display attributes. This solves what is, to my mind, one of the major deficiencies in its relationship table editor.

Here’s what I did.

In the file called C:\Documents and Settings\<username>\Application Data\XMLmind\XMLEditor\addon\dita_dtd_config\css\dita_map.css I added the following:


relcell:before { display: inline; content: attributes();
}

relcell::attribute(collection-type) { attribute-content-left: “Collection type: “; attribute-content-middle: label(attribute, collection-type, color, blue); show-attribute: always; font-size: smaller;
}

topicref:after { display: block; content: attributes();
}

topicref::attribute(linking) { attribute-content-left: “Linking: “; attribute-content-middle: label(attribute, linking, color, red); show-attribute: when-added; font-size: smaller; text-indent: 0.5cm;
}

You can change this so that it displays radio buttons or a drop-down list instead of just displaying the value of the attribute, but for me, this is enough.

DITA--Outline Numbering in PDFs

in

Hah! Finally! With help from one of the experts (France Baril) on the dita-users list, I now have outline numbering in my PDFs (in the titles themselves, in the TOC, and in the bookmarks pane).

Here’s the thread.

Transforming DITA to RTF (Word)

in

The DITA Open Toolkit can transform DITA XML to RTF (so you can open it in MS Word). But trust me, you wouldn't want to. The output is ugly, ugly, ugly. It looks like the worst kind of written-by-an engineer Word document that you could imagine. (No disrespect to engineers...)

This should have tipped me off (from Known Limitations):

You can change the styles of the output file by using tools in Microsoft® Word rather than specifying the styles before transforming.

Now why would I need to do that, exactly?

Blogged with Flock

Customizing PDF Output 2--Images in Headers

in

To add a graphic to the headers of your PDFs:

  1. Resize the image that you want to include so that it is 60 pixels or less in height. Copy this image into the directory where you keep your XML source files.
  2. Edit <open toolkit>/xsl/dita2fo-shell.xsl.
  1. Locate these lines:
<fo:region-body margin-bottom="36pt" margin-top="12pt"></fo:region-body> <fo:region-before extent="12pt"></fo:region-before>
  1. In the first line, change margin-top="12pt" to margin-top="70pt".
  1. In the second line, change extent="12pt" to extent="60pt".
  2. This makes enough space in the header for an image up to 60 pixels in height.
  3. Locate the block that begins with this line:
  4. <xsl:template name="generated-frontmatter">
  5. Within this block, locate this line:
  6. <xsl:value-of select="$booktitle"></xsl:value-of>
  7. Immediately before the opening angle bracket, add this:
<fo:external-graphic src="url(logo.gif)"/> (where logo.gif is the image that you want to add).
  1. Locate this block:
  2. <xsl:template name="main-doc3">
  3. Repeat Steps 7 and 8 in this block.
  1. Make sure that your build file copies the image file to your output directory.

Generate the PDF and the image should appear in the header of each page. Instead of the fo:external-graphic element, you can insert some SVG to generate the image.

Customizing PDF Output 1--Cover Page Art

in

The standard PDFs generated by the DITA Open Toolkit have a placeholder on the cover page for an image. I have never (yet) seen a sample PDF that actually has an image there.

So here’s how to add one:

  1. Edit <open toolkit>/xsl/dita2fo-shell.xsl.
  1. Locate this line:
  2. <fo:inline color="purple" font-weight="bold">[cover art/text goes here]</fo:inline>
  3. Replace it with something like this:
  4. <fo:external-graphic src="url(logo.gif)"/>
  5. Make sure that your build file copies the image file to your output directory.

Generate the PDF and the graphic should appear on the cover. Instead of the fo:external-graphic element, you can insert some SVG to generate the image.

Thanks to RenderX (makers of the XEP FO to PDF converter) for this excellent FO tutorial that showed me how to do this.

Javascript TOC for DITA XHTML Output

in

I had been pondering how it would be possible to get a nice looking TOC for XHTML generated from DITA sources. One of the options would be to keep FrameMaker and WebWorks Publisher in the toolchain, but this seemed like using a very large hammer to crack a very small nut.

Fortuitously, I noticed this post on the dita-users list. Shawn McKenzie has created a plugin for the Open Toolkit that is based on the treeview code from the Yahoo! UI Library (open licensed). (Get it here.)

It works like a charm:

Strong work, Shawn. Good on you!

PDF Output and Column Widths

in

If you use Apache FOP to generate PDFs, you must specify column widths for all tables. If you do not, they are not processed and do not appear in the final PDF.

So for every column, you must add a <colspec/> element after the opening <tgroup> tag. This must include a colwidth attribute.

But what values are you meant to plug in there? The example in the language reference uses values of 121* and 76* (for a two-column table):

<tgroup cols="2">
   <colspec colname="COLSPEC0" colwidth="121*"/>
   <colspec colname="COLSPEC1" colwidth="76*"/>
   .
   .
   .

What does this mean? Simply that the numbers are proportions. You could use 5* and 10* to specify that the columns should take up one third and two thirds of the available space, respectively. Specifying 50* and 100* has exactly the same effect.

And it seems that absolute and percentage values are ignored.

DITA Tip—Where to Put Your Files

in

If you are using the DITA Open Toolkit to process your DITA sources, I strongly recommend that you put your source files in a folder under the OT. For example, if you have installed the Toolkit in C:\DITA-OT1.2.2, put your source files in C:\DITA-OT1.2.2\mystuff.

This makes it a lot easier to use Ant to generate your output.

For example, let’s say all your source files are in C:\DITA-OT1.2.2\mystuff, and you want the map file called mymap.ditamap to be processed and the output placed in C:\DITA-OT1.2.2\mystuff\out.

Just add this target to build.xml:

<target name="mymap" depends="use-init">
	<antcall target="dita2xhtml">
		<param name="args.input" value="mystuff${file.separator}mymap.ditamap"/>
		<param name="output.dir" value="mystuff${file.separator}out"/>
		<param name="args.xhtml.toc" value="toc"/>
		<param name="transtype" value="xhtml"/>
	</antcall>
	<copy todir="mystuff${file.separator}out">
		<fileset dir="${dita.resource.dir}" includes="index.html"/>
	</copy>
</target>

Then just type ant mymap at the command line and away you go!

Or if you want a really easy life, just use ant prompt. That way you don’t need to get you hands dirty fiddling with build.xml at all.