org.apache.lucene.analysis
public class: LetterTokenizer [javadoc |
source]
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.apache.lucene.analysis.LetterTokenizer
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
LowerCaseTokenizer, ArabicLetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's
to say, it defines tokens as maximal strings of adjacent letters, as defined
by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.
| Method from org.apache.lucene.analysis.LetterTokenizer Summary: |
|---|
|
isTokenChar |
| Methods from org.apache.lucene.util.AttributeSource: |
|---|
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
| Methods from java.lang.Object: |
|---|
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method from org.apache.lucene.analysis.LetterTokenizer Detail: |
protected boolean isTokenChar(char c) {
return Character.isLetter(c);
}
|