Class WordToTextConverter

    • Constructor Detail

      • WordToTextConverter

        public WordToTextConverter()
                            throws javax.xml.parsers.ParserConfigurationException
        Creates new instance of WordToTextConverter. Can be used for output several HWPFDocuments into single text document.
        Throws:
        javax.xml.parsers.ParserConfigurationException - if an internal DocumentBuilder cannot be created
      • WordToTextConverter

        public WordToTextConverter​(org.w3c.dom.Document document)
        Creates new instance of WordToTextConverter. Can be used for output several HWPFDocuments into single text document.
        Parameters:
        document - XML DOM Document used as storage for text pieces
      • WordToTextConverter

        public WordToTextConverter​(TextDocumentFacade textDocumentFacade)
    • Method Detail

      • getText

        public static java.lang.String getText​(DirectoryNode root)
                                        throws java.lang.Exception
        Throws:
        java.lang.Exception
      • getText

        public static java.lang.String getText​(java.io.File docFile)
                                        throws java.lang.Exception
        Throws:
        java.lang.Exception
      • getText

        public static java.lang.String getText​(HWPFDocumentCore wordDocument)
                                        throws java.lang.Exception
        Throws:
        java.lang.Exception
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Java main() interface to interact with WordToTextConverter

        Usage: WordToTextConverter infile outfile

        Where infile is an input .doc file ( Word 95-2007) which will be rendered as plain text into outfile
        Throws:
        java.lang.Exception
      • getText

        public java.lang.String getText()
                                 throws java.lang.Exception
        Throws:
        java.lang.Exception
      • isOutputSummaryInformation

        public boolean isOutputSummaryInformation()
      • setOutputSummaryInformation

        public void setOutputSummaryInformation​(boolean outputDocumentInformation)