Save This Page
Home » lucene-2.4.1-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
abstract public class: TokenStream [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.TokenStream

Direct Known Subclasses:
    CJKTokenizer, PorterStemFilter, WikipediaTokenizer, RussianStemFilter, TokenFilter, CharTokenizer, EdgeNGramTokenizer, ChineseTokenizer, NGramTokenFilter, FrenchStemFilter, FastStringTokenizer, SingleTokenTokenStream, Tokenizer, RussianLowerCaseFilter, LetterTokenizer, ElisionFilter, GermanStemFilter, RussianLetterTokenizer, TokenRangeSinkTokenizer, SnowballFilter, KeywordTokenizer, EmptyTokenStream, DateRecognizerSinkTokenizer, PatternTokenizer, DutchStemFilter, CachingTokenFilter, BrazilianStemFilter, CompoundWordTokenFilterBase, ShingleMatrixFilter, ChineseFilter, EdgeNGramTokenFilter, StandardFilter, LowerCaseTokenizer, TeeTokenFilter, PrefixAndSuffixAwareTokenFilter, ShingleFilter, PrefixAwareTokenFilter, TokenTypeSinkTokenizer, SynonymTokenFilter, SinkTokenizer, StandardTokenizer, ThaiWordFilter, WhitespaceTokenizer, NumericPayloadTokenFilter, TokenOffsetPayloadTokenFilter, StopFilter, TypeAsPayloadTokenFilter, NGramTokenizer, ISOLatin1AccentFilter, QPTestFilter, LengthFilter, GreekLowerCaseFilter, DictionaryCompoundWordTokenFilter, HyphenationCompoundWordTokenFilter, LowerCaseFilter

A TokenStream enumerates the sequence of tokens, either from fields of a document or from query text.

This is an abstract class. Concrete subclasses are:

NOTE: subclasses must override #next(Token) . It's also OK to instead override #next() but that method is now deprecated in favor of #next(Token) .
Method from org.apache.lucene.analysis.TokenStream Summary:
close,   next,   next,   reset
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.TokenStream Detail:
 public  void close() throws IOException 
    Releases resources associated with this stream.
 public Token next() throws IOException 
Deprecated! The - returned Token is a "full private copy" (not re-used across calls to next()) but will be slower than calling #next(Token) instead..

    Returns the next token in the stream, or null at EOS.
 public Token next(Token reusableToken) throws IOException 
    Returns the next token in the stream, or null at EOS. When possible, the input Token should be used as the returned Token (this gives fastest tokenization performance), but this is not required and a new Token may be returned. Callers may re-use a single Token instance for successive calls to this method.

    This implicitly defines a "contract" between consumers (callers of this method) and producers (implementations of this method that are the source for tokens):

    • A consumer must fully consume the previously returned Token before calling this method again.
    • A producer must call Token#clear() before setting the fields in it & returning it
    Also, the producer must make no assumptions about a Token after it has been returned: the caller may arbitrarily change it. If the producer needs to hold onto the token for subsequent calls, it must clone() it before storing it. Note that a TokenFilter is considered a consumer.
 public  void reset() throws IOException 
    Resets this stream to the beginning. This is an optional operation, so subclasses may or may not implement this method. Reset() is not needed for the standard indexing process. However, if the Tokens of a TokenStream are intended to be consumed more than once, it is necessary to implement reset(). Note that if your TokenStream caches tokens and feeds them back again after a reset, it is imperative that you clone the tokens when you store them away (on the first pass) as well as when you return them (on future passes after reset()).