@Deprecated
public class LanguageProfilerBuilder
extends java.lang.Object
Constructor and Description |
---|
LanguageProfilerBuilder(java.lang.String name)
Deprecated.
Constructs a new ngram profile where minlen=3, maxlen=3
|
LanguageProfilerBuilder(java.lang.String name,
int minlen,
int maxlen)
Deprecated.
Constructs a new ngram profile
|
Modifier and Type | Method and Description |
---|---|
void |
add(java.lang.StringBuffer word)
Deprecated.
Adds ngrams from a single word to this profile
|
void |
analyze(java.lang.StringBuilder text)
Deprecated.
Analyzes a piece of text
|
static LanguageProfilerBuilder |
create(java.lang.String name,
java.io.InputStream is,
java.lang.String encoding)
Deprecated.
Creates a new Language profile from (preferably quite large - 5-10k of
lines) text file
|
java.lang.String |
getName()
Deprecated.
|
float |
getSimilarity(LanguageProfilerBuilder another)
Deprecated.
Calculates a score how well NGramProfiles match each other
|
java.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry> |
getSorted()
Deprecated.
Returns a sorted list of ngrams (sort done by 1.
|
void |
load(java.io.InputStream is)
Deprecated.
Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)
|
static void |
main(java.lang.String[] args)
Deprecated.
main method used for testing only
|
void |
save(java.io.OutputStream os)
Deprecated.
Writes NGramProfile content into OutputStream, content is outputted with
UTF-8 encoding
|
java.lang.String |
toString()
Deprecated.
|
public LanguageProfilerBuilder(java.lang.String name, int minlen, int maxlen)
name
- is the name of the profileminlen
- is the min length of ngram sequencesmaxlen
- is the max length of ngram sequencespublic LanguageProfilerBuilder(java.lang.String name)
name
- is a name of profile, usually two length stringpublic java.lang.String getName()
public void add(java.lang.StringBuffer word)
word
- is the word to addpublic void analyze(java.lang.StringBuilder text)
text
- the text to be analyzedpublic java.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry> getSorted()
public java.lang.String toString()
toString
in class java.lang.Object
public float getSimilarity(LanguageProfilerBuilder another) throws TikaException
another
- ngram profile to compare againstTikaException
- if could not calculate a scorepublic void load(java.io.InputStream is) throws java.io.IOException
is
- the InputStream to readjava.io.IOException
public static LanguageProfilerBuilder create(java.lang.String name, java.io.InputStream is, java.lang.String encoding) throws TikaException
name
- to be given for the profileis
- a stream to be readencoding
- is the encoding of streamTikaException
- if could not create a language profilepublic void save(java.io.OutputStream os) throws java.io.IOException
os
- the Stream to output tojava.io.IOException
public static void main(java.lang.String[] args)
args
- "Copyright © 2010 - 2020 Adobe Systems Incorporated. All Rights Reserved"