Betsy Rolland
LIS 600 Independent Study

Transforming XML to Microsoft XML
Terry Brooks
Winter 2006

Working with XML and MS Word

Of the three MS Office applications with XML support, Microsoft Word has the greatest flexibility for working with your XML data. You can open your data and view it "as-is" and your XML hierarchies will show up as tags like this:

 

Or you can choose to apply an XSLT stylesheet on load to display your data either as a Word doc (transforming your plain XML into Word's WordprocessingML) or as plain XML transformed somehow.

When it's time to save your data, Word gives you several options. First, you can save "data only" which will save only your XML data inside tags and not any Word-specific formatting. Second, you can apply a transformation before saving using an XSLT stylesheet. (You can also use these two options together.) Third, you can save your document as WordprocessingML, Word's own XML format.

WordprocessingML at its most basic requires the following structure:

After the standard XML header, you have a processing instruction that associates this XML file with Microsoft Word. Next, the root element "w:wordDocument" begins, along with a bunch of namspaces. The only one needed in this basic file is really the "w" namespace. The attribute "xml:space="preserve"" is useful in that it maintains whitespace in your document.

Next, the body of the document begins. The body is made up of paragraph (w:p) elements, all of which must contain a run (w:r) element with a text (w:t) element inside. A document can have only one body, but unlimited paragraphs. Each paragraph has exactly one run element and one text element.

Each element of text needs to be in a separate structure. Any styles or formatting can only be applied to this unit; mixed elements are not allowed. For example, if you want to have a bold sentence with italics in the middle, you would need to create three separate text elements: (1) the bold words at the beginning of the sentence, (2) the bold and italicized words in the middle of the sentence and (3) the bold words at the end of the sentence. The sentence below would require three separate paragraph structures to display properly in Word.
The quick brown fox jumped over the lazy dog.
bold
bold and italics
bold

Any styles used in the document need to be defined at the beginning of the XML file. For more information about styles in WordprocessingML, please see this tutorial.