Package org.apache.tika.language
Class LanguageProfilerBuilder
- java.lang.Object
-
- org.apache.tika.language.LanguageProfilerBuilder
-
@Deprecated public class LanguageProfilerBuilder extends java.lang.Object
Deprecated.This class runs a ngram analysis over submitted text, results might be used for automatic language identification. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new NGramProfiles profiles.
-
-
Constructor Summary
Constructors Constructor Description LanguageProfilerBuilder(java.lang.String name)
Deprecated.Constructs a new ngram profile where minlen=3, maxlen=3LanguageProfilerBuilder(java.lang.String name, int minlen, int maxlen)
Deprecated.Constructs a new ngram profile
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
add(java.lang.StringBuffer word)
Deprecated.Adds ngrams from a single word to this profilevoid
analyze(java.lang.StringBuilder text)
Deprecated.Analyzes a piece of textstatic LanguageProfilerBuilder
create(java.lang.String name, java.io.InputStream is, java.lang.String encoding)
Deprecated.Creates a new Language profile from (preferably quite large - 5-10k of lines) text filejava.lang.String
getName()
Deprecated.float
getSimilarity(LanguageProfilerBuilder another)
Deprecated.Calculates a score how well NGramProfiles match each otherjava.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry>
getSorted()
Deprecated.Returns a sorted list of ngrams (sort done by 1.void
load(java.io.InputStream is)
Deprecated.Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)static void
main(java.lang.String[] args)
Deprecated.main method used for testing onlyvoid
save(java.io.OutputStream os)
Deprecated.Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encodingjava.lang.String
toString()
Deprecated.
-
-
-
Constructor Detail
-
LanguageProfilerBuilder
public LanguageProfilerBuilder(java.lang.String name, int minlen, int maxlen)
Deprecated.Constructs a new ngram profile- Parameters:
name
- is the name of the profileminlen
- is the min length of ngram sequencesmaxlen
- is the max length of ngram sequences
-
LanguageProfilerBuilder
public LanguageProfilerBuilder(java.lang.String name)
Deprecated.Constructs a new ngram profile where minlen=3, maxlen=3- Parameters:
name
- is a name of profile, usually two length string- Since:
- Tika 1.0
-
-
Method Detail
-
getName
public java.lang.String getName()
Deprecated.- Returns:
- Returns the name.
-
add
public void add(java.lang.StringBuffer word)
Deprecated.Adds ngrams from a single word to this profile- Parameters:
word
- is the word to add
-
analyze
public void analyze(java.lang.StringBuilder text)
Deprecated.Analyzes a piece of text- Parameters:
text
- the text to be analyzed
-
getSorted
public java.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry> getSorted()
Deprecated.Returns a sorted list of ngrams (sort done by 1. frequency 2. sequence)- Returns:
- sorted vector of ngrams
-
toString
public java.lang.String toString()
Deprecated.- Overrides:
toString
in classjava.lang.Object
-
getSimilarity
public float getSimilarity(LanguageProfilerBuilder another) throws TikaException
Deprecated.Calculates a score how well NGramProfiles match each other- Parameters:
another
- ngram profile to compare against- Returns:
- similarity 0=exact match
- Throws:
TikaException
- if could not calculate a score
-
load
public void load(java.io.InputStream is) throws java.io.IOException
Deprecated.Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)- Parameters:
is
- the InputStream to read- Throws:
java.io.IOException
-
create
public static LanguageProfilerBuilder create(java.lang.String name, java.io.InputStream is, java.lang.String encoding) throws TikaException
Deprecated.Creates a new Language profile from (preferably quite large - 5-10k of lines) text file- Parameters:
name
- to be given for the profileis
- a stream to be readencoding
- is the encoding of stream- Throws:
TikaException
- if could not create a language profile
-
save
public void save(java.io.OutputStream os) throws java.io.IOException
Deprecated.Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding- Parameters:
os
- the Stream to output to- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args)
Deprecated.main method used for testing only- Parameters:
args
-
-
-