Package org.apache.tika.parser.microsoft
Class AbstractOfficeParser
- java.lang.Object
-
- org.apache.tika.parser.AbstractParser
-
- org.apache.tika.parser.microsoft.AbstractOfficeParser
-
- All Implemented Interfaces:
java.io.Serializable
,Parser
- Direct Known Subclasses:
OfficeParser
,OOXMLParser
,Word2006MLParser
public abstract class AbstractOfficeParser extends AbstractParser
Intermediate layer to setOfficeParserConfig
uniformly.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description AbstractOfficeParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configure(ParseContext parseContext)
Checks to see if the user has specified anOfficeParserConfig
.boolean
getExtractAllAlternativesFromMSG()
boolean
getExtractMacros()
boolean
getIncludeDeletedContent()
boolean
getIncludeMoveFromContent()
boolean
getUseSAXDocxExtractor()
void
setByteArrayMaxOverride(int maxOverride)
WARNING: this sets a static variable in POI.void
setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
void
setDateFormatOverride(java.lang.String format)
void
setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text.void
setExtractMacros(boolean extractMacros)
void
setIncludeDeletedContent(boolean includeDeletedConent)
void
setIncludeMoveFromContent(boolean includeMoveFromContent)
void
setIncludeShapeBasedContent(boolean includeShapeBasedContent)
void
setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
void
setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
-
Methods inherited from class org.apache.tika.parser.AbstractParser
parse
-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.tika.parser.Parser
getSupportedTypes, parse
-
-
-
-
Method Detail
-
configure
public void configure(ParseContext parseContext)
Checks to see if the user has specified anOfficeParserConfig
. If so, no changes are made; if not, one is added to the context.- Parameters:
parseContext
-
-
getIncludeDeletedContent
public boolean getIncludeDeletedContent()
- Returns:
- See Also:
OfficeParserConfig.getIncludeDeletedContent()
-
getIncludeMoveFromContent
public boolean getIncludeMoveFromContent()
- Returns:
- See Also:
OfficeParserConfig.getIncludeMoveFromContent()
-
getUseSAXDocxExtractor
public boolean getUseSAXDocxExtractor()
- Returns:
- See Also:
OfficeParserConfig.getUseSAXDocxExtractor()
-
getExtractMacros
public boolean getExtractMacros()
- Returns:
- whether or not to extract macros
- See Also:
OfficeParserConfig.getExtractMacros()
-
setIncludeDeletedContent
@Field public void setIncludeDeletedContent(boolean includeDeletedConent)
-
setIncludeMoveFromContent
@Field public void setIncludeMoveFromContent(boolean includeMoveFromContent)
-
setIncludeShapeBasedContent
@Field public void setIncludeShapeBasedContent(boolean includeShapeBasedContent)
-
setUseSAXDocxExtractor
@Field public void setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
-
setUseSAXPptxExtractor
@Field public void setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
-
setExtractMacros
@Field public void setExtractMacros(boolean extractMacros)
-
setConcatenatePhoneticRuns
@Field public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
-
setExtractAllAlternativesFromMSG
@Field public void setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text. The default behavior is to pick the first non-null value and include only that. If you'd like to extract all non-null body content, which is likely duplicative, set this value to true.- Parameters:
extractAllAlternativesFromMSG
- whether or not to extract all alternative parts from msg files- Since:
- 1.17
-
getExtractAllAlternativesFromMSG
public boolean getExtractAllAlternativesFromMSG()
-
setByteArrayMaxOverride
@Field public void setByteArrayMaxOverride(int maxOverride)
WARNING: this sets a static variable in POI. This allows users to override POI's protection of the allocation of overly large byte arrays. Use carefully; and please open up issues on POI's bugzilla to bump values for specific records.- Parameters:
maxOverride
-
-
setDateFormatOverride
@Field public void setDateFormatOverride(java.lang.String format)
-
-