|
Qizx/open 4.1 API | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.qizx.api.util.fulltext.DefaultTextTokenizer
public class DefaultTextTokenizer
Generic Text Tokenizer, suitable for most Western languages.
Words are 1) a sequence of letters or digits, beginning with a letter; 2) a number, without exponent. Words never contain a dash or an apostrophe.
| Field Summary |
|---|
| Fields inherited from interface com.qizx.api.fulltext.TextTokenizer |
|---|
END, PARAGRAPH, SENTENCE, WORD |
| Constructor Summary | |
|---|---|
DefaultTextTokenizer()
|
|
| Method Summary | |
|---|---|
void |
copyTokenTo(char[] array,
int start)
Copies the current token into a character array. |
void |
defineSpecialChar(char ch)
Define a character to recognize when parsing of special characters is enabled. |
int |
getDigitMax()
Returns the maximum number of digits a word can contain. |
char[] |
getTokenChars()
Returns the current token as a new character array. |
int |
getTokenLength()
Returns the original length of the last word returned by nextWord. |
int |
getTokenOffset()
Returns the offset (in source text chunk) of the last word returned by nextWord. |
boolean |
gotWildcard()
Returns true if wildcard characters have been recognized in the current token. |
boolean |
isAcceptingWildcards()
Returns true if wildcard characters are recognized. |
boolean |
isParsingSpecialChars()
Returns true if special characters are recognized. |
int |
nextToken()
Returns the type of the next token, or END if no more token can be found. |
void |
setAcceptingWildcards(boolean acceptingWildcards)
If set to true, wildcard characters are recognized. |
void |
setDigitMax(int max)
Sets the maximum number of digits a word can contain. |
void |
setParsingSpecialChars(boolean parsingSpecialChars)
If set to true, special characters are recognized. |
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
void |
start(CharSequence text)
Starts the analysis of a new text chunk. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public DefaultTextTokenizer()
| Method Detail |
|---|
public void start(char[] text,
int length)
TextTokenizer
start in interface TextTokenizertext - characters to tokenizelength - number of characters in the text arraypublic void start(CharSequence text)
TextTokenizer
start in interface TextTokenizertext - fragment to tokenize
public void copyTokenTo(char[] array,
int start)
TextTokenizer
copyTokenTo in interface TextTokenizerarray - destination array. Must fit the size of the token.start - offset in the destination array.public char[] getTokenChars()
TextTokenizer
getTokenChars in interface TextTokenizerpublic int getTokenOffset()
TextTokenizer
getTokenOffset in interface TextTokenizerpublic int getTokenLength()
TextTokenizer
getTokenLength in interface TextTokenizerpublic boolean isAcceptingWildcards()
TextTokenizerWildcard character sequences are ".", ".?", ".*", ".+", and ".{n,m}"
isAcceptingWildcards in interface TextTokenizerpublic void setAcceptingWildcards(boolean acceptingWildcards)
TextTokenizer
setAcceptingWildcards in interface TextTokenizerpublic boolean isParsingSpecialChars()
TextTokenizer
isParsingSpecialChars in interface TextTokenizerTextTokenizer.defineSpecialChar(char)public void setParsingSpecialChars(boolean parsingSpecialChars)
TextTokenizer
setParsingSpecialChars in interface TextTokenizerTextTokenizer.defineSpecialChar(char)public void defineSpecialChar(char ch)
TextTokenizer
defineSpecialChar in interface TextTokenizerpublic boolean gotWildcard()
TextTokenizer
gotWildcard in interface TextTokenizerpublic int nextToken()
TextTokenizer
nextToken in interface TextTokenizerpublic int getDigitMax()
TextTokenizer
getDigitMax in interface TextTokenizerpublic void setDigitMax(int max)
TextTokenizer
setDigitMax in interface TextTokenizer
|
© 2010 Axyana Software | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||