Processes streams in the Horrible Property Set Format (HPSF) in POI filesystems. Microsoft Office documents, i.e. POI filesystems, usually contain meta data like author, title, last saving time etc. These items are called properties and stored in property set streams along with the document itself. These streams are commonly named \005SummaryInformation and \005DocumentSummaryInformation. However, a POI filesystem may contain further property sets of other names or types.
In order to extract the properties from a POI filesystem, a property set
stream's contents must be parsed into a
PropertySet instance. Its subclasses
DocumentSummaryInformation deal with the well-known
property set streams \005SummaryInformation and
\005DocumentSummaryInformation. (However, the streams' names are
irrelevant. What counts is the property set's first section's format ID -
The factory method
PropertySet instance. This method
always returns the most specific property set: If it
identifies the stream data as a Summary Information or as a Document
Summary Information it returns an instance of the corresponding class, else
PropertySet contains a list of
Sections which can be retrieved with
Section contains a
Property array which can be retrieved with
Section.getProperties(). Since the vast majority of
PropertySets contains only a single
Section, the convenience method
PropertySet.getProperties() returns the properties of a
Section (throwing a
NoSingleSectionException if the
PropertySet contains more (or less) than exactly one
Property has an ID, a
type, and a value which can be retrieved
Property.getValue(), respectively. The value's class
depends on the property's type. The current implementation
does not yet support all property types and restricts the values' classes
Date. A value of a yet unknown type is returned as a byte array
containing the value's origin bytes from the property set stream.
DocumentSummaryInformation classes provide convenience
methods for retrieving well-known properties. For example, an application
that wants to retrieve a document's title string just calls
SummaryInformation.getTitle() instead of going through
the hassle of first finding out what the title's property ID is and then
using this ID to get the property's value.
Public documentation from Microsoft can be found in the appropriate section of the MSDN Library.
The following is still left to be implemented. Sponsering could foster these issues considerably.
Convenience methods for setting summary information and document summary information properties
Better codepage support
Support for more property (variant) types
Class Summary Class Description Array Blob ClassIDRepresents a class ID (16 bytes). ClipboardData CodePageString Currency CustomProperties CustomPropertyThis class represents custom properties in the document summary information stream. Date Decimal DocumentSummaryInformationConvenience class representing a DocumentSummary Information stream in a Microsoft Office document. FiletimeThe Windows FILETIME structure holds a date and time associated with a file. GUID HPSFPropertiesOnlyDocumentA version of
POIDocumentwhich allows access to the HPSF Properties, but no other document contents.
IndirectPropertyName Property PropertySetRepresents a property set in the Horrible Property Set Format (HPSF). PropertySetFactory SectionRepresents a section in a
SummaryInformationConvenience class representing a Summary Information stream in a Microsoft Office document. ThumbnailClass to manipulate data in the Clipboard Variant (
TypedPropertyValue UnicodeString VariantThe
Varianttypes as defined by Microsoft's COM.
VariantBool VariantSupportSupports reading and writing of variant data. VectorHolder for vector-type properties VersionedStream
Enum Summary Enum Description ClassIDPredefined
Exception Summary Exception Description HPSFExceptionThis exception is the superclass of all other checked exceptions thrown in this package. HPSFRuntimeExceptionThis exception is the superclass of all other unchecked exceptions thrown in this package. IllegalPropertySetDataExceptionThis exception is thrown when there is an illegal value set in a
IllegalVariantTypeExceptionThis exception is thrown if HPSF encounters a variant type that is illegal in the current context. MarkUnsupportedException MissingSectionException NoFormatIDException NoPropertySetStreamExceptionThis exception is thrown if a format error in a property set stream is detected or when the input data do not constitute a property set stream. NoSingleSectionException ReadingNotSupportedExceptionThis exception is thrown when HPSF tries to read a (yet) unsupported variant type. UnexpectedPropertySetTypeExceptionThis exception is thrown if a certain type of property set is expected (e.g. UnsupportedVariantTypeExceptionThis exception is thrown if HPSF encounters a variant type that isn't supported yet. VariantTypeExceptionThis exception is thrown if HPSF encounters a problem with a variant type. WritingNotSupportedExceptionThis exception is thrown when trying to write a (yet) unsupported variant type.