Package com.adobe.internal.pdftoolkit.core.cos

The Gibson cos layer model is based on the PDF 1.7 spec. It provides complete support for the spec.

1. Basic Cos Object Types

There are two categories of cos object types: atomic and collection. Atomic types correspond to kinds of values, and collection types correspond to collections of cos objects. All cos object types are descendents of CosObject. All CosObjects share the following attributes:

  • A CosObjectID (identifier consisting of an object number and a generation value) which can be queried or set
  • A status within a CosDocument context (direct or indirect - see Section 2.1) which can be queried
  • A value which can be accessed (via getValue()) or written as a String (toString())
  • A clone method which will produce a copy of the object preserving its closure

1.1 Atomic types

1.1.1 CosScalar

Each of these types encapsulates a primitive value or Adobe primitive value. These types are descendents of CosScalar.

  • CosBoolean - boolean
  • CosNumeric - Number (int, double, long)
  • CosName - ASName

1.1.2 CosString

This type encapsulates a string as a byte array which may be encoded as a string according to the platform specific encoding or a hex string. The type supports encryption based on parameters provided by the CosDocument context.

1.1.3 CosNull

This type contains a null value.

1.2 Collection types

Each of these types has attributes for managing a collection of CosObjects. They all are descendents of CosContainer.

1.2.1 CosArray

This collection supports the behavior of ArrayList (an indexed, ordered list). The elements must be of type CosObject.

1.2.2 CosDictionary

This collection supports the behavior of HashMap. The keys must be of type ASName, and the values must be of type CosObject.

1.2.3 CosStream

This is a child class of CosDictionary because a CosStream is composed of a dictionary part and a data stream part. The dictionary part is a CosDictionary. There are additional attributes for managing the data stream. The data stream may be encrypted using CosDocument encryption parameters or its own set. The data stream additionally may be encoded using filters defined as entries in the object's dictionary.

1.2.4 CosObjectStream

This is a child class of CosStream. The data in the data stream are compressed objects (cos objects written to the data stream and not to the CosDocument). There are attributes for managing the collection of compressed objects associated with this object.

2. Cos Document

CosDocument defines the context within which a collection of CosObjects exists. A CosDocument instance manages creation of, access to, and deletion of all CosObjects in its collection. In addition, a CosDocument instance can produce an output (in PDF file format as specified by the PDF 1.7 spec) of its collection of objects - this output is a save.

2.1 Direct and Indirect Cos Objects

According to the PDF 1.7 spec, most CosObjects can be either direct or indirect. There are exceptions; CosStream objects can only be indirect. There also are some implementation dependent exceptions for CosObjects being used in particular roles within the CosDocument.

All CosObject creation methods are supported by an API through a singleton CosObjectFactory instance. It is possible to create direct or indirect CosObjects explicitly using the CosObjectFactory methods.

An indirect object can be referenced directly by a CosObjectID which is unique within the CosDocument context. Only indirect objects have entries in the xref section of a CosDocument. A direct object cannot be referenced directly; all direct objects have a null CosObjectID (object number and generation number of 0).

Here are CosDocument attributes for managing direct and indirect objects:

  • Direct objects must be contained within an indirect collection. It is possible to create a direct CosArray or CosDictionary (but not a CosStream or CosObjectStream) but it ultimately must be contained within an indirect collection in order for it to be written out in a save.
  • If a CosObject is cloned from another document, the clone's direct or indirect status also is copied.
  • When a collection type CosObject is cloned, its closure is preserved. The closure is defined as the set of all objects directly contained within and referenced by the collection. Thus, the cloning process will produce a copy of a collection (including references) plus all of the objects being referenced. The clone indirect objects (and their references) will be in terms of the target document but preserve the semantics of the template document.
  • Each indirect object will be written out during a save, but each reference to the object is represented (and written out) as a CosObjectRef. A CosObjectRef is also a CosObject but will return the attributes of its referent. It is handled by CosDocument as a direct object.
  • When a new indirect object is created within a document, it is assigned a unique object number. The PDF spec says that the only restriction on object number assignment is that each object have a unique number within the document. When an indirect object is deleted, it is marked "free" within the document. The object number is then available for assignment to another object, but the generation number is incremented so that the re-assigned number is part of a different CosObjectID. CosDocument manages both free and in use indirect objects (see the description of xrefs below).
  • Encryption uses the CosObjectID as a seed for encryption parameters when encryption is applied to CosString, CosStream, or CosObjectStream objects. CosString objects commonly are direct objects; this means that such an object will not have its own CosObjectID. A CosString therefore contains the CosObjectID of the highest enclosing indirect collection ("parent") and this id is used for encryption. Since a collection itself may also be direct, each CosContainer has an attribute for the parent CosObjectID (itself if the collection is indirect) which can be assigned to each CosContainer or CosString being added to the collection. If a direct object which already has a value for the parent CosObjectID is added to a collection, a copy is made of it and the copy is assigned the parent ID of the collection to which the copy is being added.
  • 2.2 Save

    A CosDocument can perform either an incremental save or a full save of the state of the document at the time it receives the save request via its API. The save will produce a byte stream output in PDF file format.

    2.2.1 Incremental save

    This applies only to CosDocument instances created from PDF file input. The incremental save reproduces the original input plus adds update information (changed (modified or deleted) or new cos objects - "dirty" cos objects). The update information includes writing out all dirty indirect objects plus creating an xref for them. If an incrementally saved file is opened in Acrobat, the modified file will be seen. Incremental save will not alter a document's output file attributes (for example, xref style) or encryption.

    2.2.2 Full save

    This applies to any CosDocument instance. At full save, all of the objects are written out and a new xref is created. It is possible to specify output file attributes such as xref style or encryption.

    2.2.2.1 Xref

    All of a document's indirect objects (free and in use) have an entry in the document's xref. There are 3 styles of xref which are supported in a full save: xref table, xref stream, and hybrid.

    2.2.2.1.1 Xref Table

    This is the "original" xref style for the PDF file format. Each xref entry is a line in a table; free objects are in a linked list with each entry pointing to the next free object. The list is anchored by the free object root (object 0, generation 65535). The entries are not numbered, so they are expected to be sequential. If they are not, each sequence break defines an xref section which is given a header with its starting object number and the section size (number of sequential entries).

    2.2.2.1.2 Xref Stream

    This applies only to PDF version 1.5 and later. The xref is represented as a CosStream; the xref entries (including the xref object) are written to the data stream according to a format specified in the PDF spec. The data stream is flate encoded.

    The only way to produce xref entries for compressed objects is as entries in an xref stream.

    2.2.2.1.3 Hybrid

    This option is for producing output for documents that contain compressed objects but need to be opened by readers with versions prior to version 1.5. An xref table is produced for all the indirect objects, but the entries for compressed object streams and the objects they contain are marked as free (with a generation of 65535). Then, the compressed object streams are written to the file as an update along with an xref stream for those objects. A special xref table is constructed for the update. A version 1.5 or later reader will be able to access the compressed objects, while a pre-1.5 reader will still see a valid file but the compressed objects will be invisible.

    2.3 CosDocument Creation

    It is possible to create an initialized CosDocument instance in 3 scenarios: "empty" (a catalog and trailer, but no other cos objects); a clone of another document (a copy of another document's objects but not output file attributes); and from an input PDF file (accepts PDF or FDF formats).

    If a CosDocument is instantiated from an input PDF file, it will not instantiate all of the objects in the input file upon creation. It will only instantiate the objects which are "dirty" or which are referenced.

    If possible, the full save method will use an optimization ("express save") which will simply copy the bytes of uninstantiated objects from the input stream to the output stream.