Tuesday, 8 July 2008

OpenXML Packages

I am taking the various files that users refer to when documenting the process flow in their organization and putting them all in an OpenXML package (using System.IO.Packaging).

Office provides an additional layer around these packages to make it a bit easier to produce Office documents (which are just packages with a particular internal structure). See here for the SDK.

The Office SDK makes some things a little easier, although some of the names had me a bit confused at first (CustomXmlPart for example has nothing XML-ish about it, it's just a nice way to put things in the place in a .docx file where Word expects the custom XML to go). But with that extra layer you lose some of the functionality of the underlying System.IO.Packaging. For example, the Office SDK provides some nice functions to add images but you don't get to specify the compression. One JPG file ended up going from 276,216 bytes to 403,084 bytes, i.e. actually getting bigger. Using the underlying System.IO.Packaging, I am able to specify no compression for JPGs -- they have such low entropy all you do is add the overhead of the ZIP housekeeping bits.

Curiously, with both the Office SDK and the underlying classes you have to specify the image type. I couldn't find anything in the DotNet foundation classes to tell me the image type given a stream or even given a file name, so I had to bake something trivial. Fortunately, a lot of the time I know exactly what the image type is anyway.

No comments: