Package org.apache.tika.parser.microsoft
Class AbstractOfficeParser
- java.lang.Object
-
- org.apache.tika.parser.AbstractParser
-
- org.apache.tika.parser.microsoft.AbstractOfficeParser
-
- All Implemented Interfaces:
java.io.Serializable,Parser
- Direct Known Subclasses:
OfficeParser,OOXMLParser,Word2006MLParser
public abstract class AbstractOfficeParser extends AbstractParser
Intermediate layer to setOfficeParserConfiguniformly.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description AbstractOfficeParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconfigure(ParseContext parseContext)Checks to see if the user has specified anOfficeParserConfig.booleangetExtractAllAlternativesFromMSG()booleangetExtractMacros()booleangetIncludeDeletedContent()booleangetIncludeMoveFromContent()booleangetUseSAXDocxExtractor()voidsetByteArrayMaxOverride(int maxOverride)WARNING: this sets a static variable in POI.voidsetConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)voidsetDateFormatOverride(java.lang.String format)voidsetExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)Some .msg files can contain body content in html, rtf and/or text.voidsetExtractMacros(boolean extractMacros)voidsetIncludeDeletedContent(boolean includeDeletedConent)voidsetIncludeMoveFromContent(boolean includeMoveFromContent)voidsetIncludeShapeBasedContent(boolean includeShapeBasedContent)voidsetUseSAXDocxExtractor(boolean useSAXDocxExtractor)voidsetUseSAXPptxExtractor(boolean useSAXPptxExtractor)-
Methods inherited from class org.apache.tika.parser.AbstractParser
parse
-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.parser.Parser
getSupportedTypes, parse
-
-
-
-
Method Detail
-
configure
public void configure(ParseContext parseContext)
Checks to see if the user has specified anOfficeParserConfig. If so, no changes are made; if not, one is added to the context.- Parameters:
parseContext-
-
getIncludeDeletedContent
public boolean getIncludeDeletedContent()
- Returns:
- See Also:
OfficeParserConfig.getIncludeDeletedContent()
-
getIncludeMoveFromContent
public boolean getIncludeMoveFromContent()
- Returns:
- See Also:
OfficeParserConfig.getIncludeMoveFromContent()
-
getUseSAXDocxExtractor
public boolean getUseSAXDocxExtractor()
- Returns:
- See Also:
OfficeParserConfig.getUseSAXDocxExtractor()
-
getExtractMacros
public boolean getExtractMacros()
- Returns:
- whether or not to extract macros
- See Also:
OfficeParserConfig.getExtractMacros()
-
setIncludeDeletedContent
@Field public void setIncludeDeletedContent(boolean includeDeletedConent)
-
setIncludeMoveFromContent
@Field public void setIncludeMoveFromContent(boolean includeMoveFromContent)
-
setIncludeShapeBasedContent
@Field public void setIncludeShapeBasedContent(boolean includeShapeBasedContent)
-
setUseSAXDocxExtractor
@Field public void setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
-
setUseSAXPptxExtractor
@Field public void setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
-
setExtractMacros
@Field public void setExtractMacros(boolean extractMacros)
-
setConcatenatePhoneticRuns
@Field public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
-
setExtractAllAlternativesFromMSG
@Field public void setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text. The default behavior is to pick the first non-null value and include only that. If you'd like to extract all non-null body content, which is likely duplicative, set this value to true.- Parameters:
extractAllAlternativesFromMSG- whether or not to extract all alternative parts from msg files- Since:
- 1.17
-
getExtractAllAlternativesFromMSG
public boolean getExtractAllAlternativesFromMSG()
-
setByteArrayMaxOverride
@Field public void setByteArrayMaxOverride(int maxOverride)
WARNING: this sets a static variable in POI. This allows users to override POI's protection of the allocation of overly large byte arrays. Use carefully; and please open up issues on POI's bugzilla to bump values for specific records.- Parameters:
maxOverride-
-
setDateFormatOverride
@Field public void setDateFormatOverride(java.lang.String format)
-
-