APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java. Most MS Office files use OLE2CDF. [Open Source, BSD-like]
A XML/SAX -compliant 100% pure Java Microsoft Word document content parser, with easy DOM/XSLT integration. Exposes also document structure, style information, and even graphics. [Commercial, demos]