Package org.apache.poi.hpsf
Processes streams in the Horrible Property Set Format (HPSF) in POI filesystems. Microsoft Office documents, i.e. POI filesystems, usually contain meta data like author, title, last saving time etc. These items are called properties and stored in property set streams along with the document itself. These streams are commonly named \005SummaryInformation and \005DocumentSummaryInformation. However, a POI filesystem may contain further property sets of other names or types.
In order to extract the properties from a POI filesystem, a property set
stream's contents must be parsed into a PropertySet
instance. Its subclasses SummaryInformation
and DocumentSummaryInformation
deal with the well-known
property set streams \005SummaryInformation and
\005DocumentSummaryInformation. (However, the streams' names are
irrelevant. What counts is the property set's first section's format ID -
see below.)
The factory method PropertySetFactory.create(org.apache.poi.poifs.filesystem.DirectoryEntry, java.lang.String)
creates a PropertySet
instance. This method
always returns the most specific property set: If it
identifies the stream data as a Summary Information or as a Document
Summary Information it returns an instance of the corresponding class, else
the general PropertySet
.
A PropertySet
contains a list of Section
s which can be retrieved with PropertySet.getSections()
. Each Section
contains a Property
array which can be retrieved with Section.getProperties()
. Since the vast majority of
PropertySet
s contains only a single Section
, the convenience method PropertySet.getProperties()
returns the properties of a
PropertySet
's Section
(throwing a NoSingleSectionException
if the PropertySet
contains more (or less) than exactly one
Section
).
Each Property
has an ID, a
type, and a value which can be retrieved
with Property.getID()
, Property.getType()
, and Property.getValue()
, respectively. The value's class
depends on the property's type. The current implementation
does not yet support all property types and restricts the values' classes
to String
, Integer
and Date
. A value of a yet unknown type is returned as a byte array
containing the value's origin bytes from the property set stream.
To retrieve the value of a specific Property
,
use Section.getProperty(long)
or Section.getPropertyIntValue(long)
.
The SummaryInformation
and DocumentSummaryInformation
classes provide convenience
methods for retrieving well-known properties. For example, an application
that wants to retrieve a document's title string just calls SummaryInformation.getTitle()
instead of going through
the hassle of first finding out what the title's property ID is and then
using this ID to get the property's value.
Public documentation from Microsoft can be found in the appropriate section of the MSDN Library.
History
- 2003-09-11:
-
PropertySetFactory.create(InputStream)
no longer throws anUnexpectedPropertySetTypeException
.
To Do
The following is still left to be implemented. Sponsering could foster these issues considerably.
-
Convenience methods for setting summary information and document summary information properties
-
Better codepage support
-
Support for more property (variant) types
-
Class Summary Class Description Array Blob ClassID Represents a class ID (16 bytes).ClipboardData CodePageString Currency CustomProperties Maintains the instances ofCustomProperty
that belong to aDocumentSummaryInformation
.CustomProperty This class represents custom properties in the document summary information stream.Date Decimal DocumentSummaryInformation Convenience class representing a DocumentSummary Information stream in a Microsoft Office document.Filetime The Windows FILETIME structure holds a date and time associated with a file.GUID HPSFPropertiesOnlyDocument A version ofPOIDocument
which allows access to the HPSF Properties, but no other document contents.IndirectPropertyName Property A property in aSection
of aPropertySet
.PropertySet Represents a property set in the Horrible Property Set Format (HPSF).PropertySetFactory Factory class to create instances ofSummaryInformation
,DocumentSummaryInformation
andPropertySet
.Section Represents a section in aPropertySet
.SummaryInformation Convenience class representing a Summary Information stream in a Microsoft Office document.Thumbnail Class to manipulate data in the Clipboard Variant (VT_CF
) format.TypedPropertyValue UnicodeString Variant TheVariant
types as defined by Microsoft's COM.VariantBool VariantSupport Supports reading and writing of variant data.Vector Holder for vector-type propertiesVersionedStream -
Enum Summary Enum Description ClassIDPredefined -
Exception Summary Exception Description HPSFException This exception is the superclass of all other checked exceptions thrown in this package.HPSFRuntimeException This exception is the superclass of all other unchecked exceptions thrown in this package.IllegalPropertySetDataException This exception is thrown when there is an illegal value set in aPropertySet
.IllegalVariantTypeException This exception is thrown if HPSF encounters a variant type that is illegal in the current context.MarkUnsupportedException This exception is thrown if anInputStream
does not support theInputStream.mark(int)
operation.MissingSectionException This exception is thrown if one of thePropertySet
's convenience methods does not find a requiredSection
.NoFormatIDException This exception is thrown if aPropertySet
is to be written but does not have a formatID set (seeSection.setFormatID(ClassID)
orSection.setFormatID(byte[])
.NoPropertySetStreamException This exception is thrown if a format error in a property set stream is detected or when the input data do not constitute a property set stream.NoSingleSectionException This exception is thrown if one of thePropertySet
's convenience methods that require a singleSection
is called and thePropertySet
does not contain exactly oneSection
.ReadingNotSupportedException This exception is thrown when HPSF tries to read a (yet) unsupported variant type.UnexpectedPropertySetTypeException This exception is thrown if a certain type of property set is expected (e.g.UnsupportedVariantTypeException This exception is thrown if HPSF encounters a variant type that isn't supported yet.VariantTypeException This exception is thrown if HPSF encounters a problem with a variant type.WritingNotSupportedException This exception is thrown when trying to write a (yet) unsupported variant type.