Betsy Rolland
LIS 600 Independent Study

Transforming XML to Microsoft XML
Terry Brooks
Winter 2006

Working with XML and MS Excel

Excel's support for XML is, in my opinion, complicated and disappointing. While opening an XML file in Word gives you the option of either displaying the XML file "as-is" or applying an XSLT stylesheet, Excel has neither of those options. When you open an XML file in Excel, you have three options:

  1. Open as an XML list
  2. Open as a read-only workbook
  3. Use the XML source taskpane

Each of these has its advantages and disadvantages, as discussed below. One of the most interesting things about working with XML data in Excel is that, while it's relatively easy to get the data into your spreadsheet, there is no way to save the data only if your XML file has more than two levels of elements. When Excel opens an XML file, it "flattens" it into a list format rather than maintaining the schema's hierarchy. Thus, when it's time to export back to XML, Excel won't allow you to export something like this:

Your only option appears to be to save the data as SpreadsheetML, then use XSLT to transform the data back to straight XML, which seems like a huge waste of effort. At the bottom of the XML Source Task Pane, there's a button to "Verify Map for Export." Clicking this button will tell you if your XML can be exported or not. On the XML structure above, verifying the map for export gives this error message:

Because the XML data we're working with allows multiple instances of "snack food," Excel can't export it. If I edit the XML file and ensure each "person" has only one "snack food", the XML is exportable.

Another issue with repeating elements is how Excel displays them. Because the fundamental structure of Excel is the table, Excel displays the XML data in the form of columns and rows. If an element has a repeating element, Excel repeats all the data for its parent and sibilings, too, like this, obscuring the fundamental structure of the XML:

Each of the three "open as" options behaves differently.

  1. Open as an XML list
    This option builds a list based on the schema (either loaded by you as a "map" or inferred by Excel from the XML data) and imports the data into the list. The element and attribute names are used as headers for the row, as shown in the screenshot above. At this point, you can use the data to do whatever sort of calculations you want. If the underlying XML data changes, the spreadsheet will not update automatically. Data can be exported from this view option.
  2. Open as a read-only workbook
    This option displays information about the source, including XPath-like headers. The "read-only" part means any changes you make to the spreadsheet cannot be exported back to the original XML. Saving saves the spreadsheet as an Excel spreadsheet. This option looks like this:

  3. Use the XML source taskpane
    The main advantage of this option seems to be that you can control what information gets displayed in your list. When you select this option, the XML source taskpane appears with the schema (again, either loaded by you as a map or inferred by Excel from the XML data) and you can drag and drop your elements onto the spreadsheet. The result looks like this:


    You can drop as many or as few of your elements onto the spreadsheet as you want, within Excel's predefined limits of how many columns are allowed. This is the best way to get around the Excel limitations for schemas that have too many levels of hierarchy. You can make separate lists with your various levels.

    In addition to the XML source taskpane, the "List" toolbar shows up. Clicking on the "Refresh XML Data" button fills the list with the data from your XML file:


    If you have stayed within Excel's hierarchy limitations, you can export this data by simply right-clicking on the spreadsheet, selecting XML->Export.