Class ExternalEmbedder

  • All Implemented Interfaces:
    java.io.Serializable, Embedder

    public class ExternalEmbedder
    extends java.lang.Object
    implements Embedder
    Embedder that uses an external program (like sed or exiftool) to embed text content and metadata into a given document.
    Since:
    Apache Tika 1.3
    See Also:
    Serialized Form
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean check​(java.lang.String[] checkCmd, int... errorValue)
      Checks to see if the command can be run.
      static boolean check​(java.lang.String checkCmd, int... errorValue)
      Checks to see if the command can be run.
      void embed​(Metadata metadata, java.io.InputStream inputStream, java.io.OutputStream outputStream, ParseContext context)
      Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler.
      java.lang.String[] getCommand()
      Gets the command to be run.
      java.lang.String getCommandAppendOperator()
      Gets the operator to append rather than replace a value for the command line tool, i.e.
      java.lang.String getCommandAssignmentDelimeter()
      Gets the delimiter for multiple assignments for the command line tool, i.e.
      java.lang.String getCommandAssignmentOperator()
      Gets the assignment operator for the command line tool, i.e.
      java.util.Map<Property,​java.lang.String[]> getMetadataCommandArguments()
      Gets the map of Metadata keys to command line parameters.
      java.util.Set<MediaType> getSupportedEmbedTypes()  
      java.util.Set<MediaType> getSupportedEmbedTypes​(ParseContext context)
      Returns the set of media types supported by this embedder when used with the given parse context.
      boolean isQuoteAssignmentValues()
      Gets whether or not to quote assignment values, i.e.
      void setCommand​(java.lang.String... command)
      Sets the command to be run.
      void setCommandAppendOperator​(java.lang.String commandAppendOperator)
      Sets the operator to append rather than replace a value for the command line tool, i.e.
      void setCommandAssignmentDelimeter​(java.lang.String commandAssignmentDelimeter)
      Sets the delimiter for multiple assignments for the command line tool, i.e.
      void setCommandAssignmentOperator​(java.lang.String commandAssignmentOperator)
      Sets the assignment operator for the command line tool, i.e.
      void setMetadataCommandArguments​(java.util.Map<Property,​java.lang.String[]> arguments)
      Sets the map of Metadata keys to command line parameters.
      void setQuoteAssignmentValues​(boolean quoteAssignmentValues)
      Sets whether or not to quote assignment values, i.e.
      void setSupportedEmbedTypes​(java.util.Set<MediaType> supportedEmbedTypes)  
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • METADATA_COMMAND_ARGUMENTS_TOKEN

        public static final java.lang.String METADATA_COMMAND_ARGUMENTS_TOKEN
        Token to be replaced with a String array of metadata assignment command arguments
        See Also:
        Constant Field Values
      • METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN

        public static final java.lang.String METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
        Token to be replaced with a String array of metadata assignment command arguments
        See Also:
        Constant Field Values
    • Constructor Detail

      • ExternalEmbedder

        public ExternalEmbedder()
    • Method Detail

      • getSupportedEmbedTypes

        public java.util.Set<MediaType> getSupportedEmbedTypes​(ParseContext context)
        Description copied from interface: Embedder
        Returns the set of media types supported by this embedder when used with the given parse context.

        The name differs from the precedence of Parser.getSupportedTypes(ParseContext) so that parser implementations may also choose to implement this interface.

        Specified by:
        getSupportedEmbedTypes in interface Embedder
        Parameters:
        context - parse context
        Returns:
        immutable set of media types
      • getSupportedEmbedTypes

        public java.util.Set<MediaType> getSupportedEmbedTypes()
      • setSupportedEmbedTypes

        public void setSupportedEmbedTypes​(java.util.Set<MediaType> supportedEmbedTypes)
      • getCommandAssignmentOperator

        public java.lang.String getCommandAssignmentOperator()
        Gets the assignment operator for the command line tool, i.e. "=".
        Returns:
        the assignment operator
      • setCommandAssignmentOperator

        public void setCommandAssignmentOperator​(java.lang.String commandAssignmentOperator)
        Sets the assignment operator for the command line tool, i.e. "=".
        Parameters:
        commandAssignmentOperator -
      • getCommandAssignmentDelimeter

        public java.lang.String getCommandAssignmentDelimeter()
        Gets the delimiter for multiple assignments for the command line tool, i.e. ", ".
        Returns:
        the assignment delimiter
      • setCommandAssignmentDelimeter

        public void setCommandAssignmentDelimeter​(java.lang.String commandAssignmentDelimeter)
        Sets the delimiter for multiple assignments for the command line tool, i.e. ", ".
        Parameters:
        commandAssignmentDelimeter -
      • getCommandAppendOperator

        public java.lang.String getCommandAppendOperator()
        Gets the operator to append rather than replace a value for the command line tool, i.e. "+=".
        Returns:
        the append operator
      • setCommandAppendOperator

        public void setCommandAppendOperator​(java.lang.String commandAppendOperator)
        Sets the operator to append rather than replace a value for the command line tool, i.e. "+=".
        Parameters:
        commandAppendOperator -
      • isQuoteAssignmentValues

        public boolean isQuoteAssignmentValues()
        Gets whether or not to quote assignment values, i.e. tag='value'. The default is false.
        Returns:
        whether or not to quote assignment values
      • setQuoteAssignmentValues

        public void setQuoteAssignmentValues​(boolean quoteAssignmentValues)
        Sets whether or not to quote assignment values, i.e. tag='value'.
        Parameters:
        quoteAssignmentValues -
      • getMetadataCommandArguments

        public java.util.Map<Property,​java.lang.String[]> getMetadataCommandArguments()
        Gets the map of Metadata keys to command line parameters.
        Returns:
        the metadata to CLI param map
      • setMetadataCommandArguments

        public void setMetadataCommandArguments​(java.util.Map<Property,​java.lang.String[]> arguments)
        Sets the map of Metadata keys to command line parameters. Set this to null to disable Metadata embedding.
        Parameters:
        arguments -
      • embed

        public void embed​(Metadata metadata,
                          java.io.InputStream inputStream,
                          java.io.OutputStream outputStream,
                          ParseContext context)
                   throws java.io.IOException,
                          TikaException
        Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted if setMetadataCommandArguments(Map) has been called to set arguments.
        Specified by:
        embed in interface Embedder
        Parameters:
        metadata - document metadata (input and output)
        inputStream - the document stream (input)
        outputStream - the output stream to write the metadata embedded data to
        context - parse context
        Throws:
        java.io.IOException - if the document stream could not be read
        TikaException - if the document could not be parsed
      • check

        public static boolean check​(java.lang.String checkCmd,
                                    int... errorValue)
        Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.
        Parameters:
        checkCmd - the check command to run
        errorValue - what is considered an error value?
        Returns:
        whether or not the check completed without error
      • check

        public static boolean check​(java.lang.String[] checkCmd,
                                    int... errorValue)
        Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.
        Parameters:
        checkCmd - the check command to run
        errorValue - what is considered an error value?
        Returns:
        whether or not the check completed without error