net.java.sen
Class StringTagger

java.lang.Object
  extended bynet.java.sen.StringTagger

public class StringTagger
extends java.lang.Object

This class generate morpheme tags from String. Sample code is here:

 StringTagger tagger = StringTagger.getInstance("/usr/local/sen/conf/sen.xml");
 Token[] token = tagger.analyze(s);
 for (int i = 0; i < token.length; i++) {
     Token t = token[i];
     String pos = t.getPos(); // part of speech
     String basic = t.getBasic(); // un-conjugate representation
     String reading = t.getReading(); // reading 
 }
 


Field Summary
protected  java.lang.String unknownPos
           
 
Method Summary
 void addPostProcessor(PostProcessor processor)
          Add PostProcessor.
 void addPreProcessor(PreProcessor processor)
          Add PreProcessor.
 Token[] analyze(java.lang.String input)
          Analyze string.
protected  Token[] doPostProcess(Token[] tokens, java.util.Map postProcessInfo)
          Execute all registered preprocess.
protected  java.lang.String doPreProcess(java.lang.String input, java.util.Map postProcessInfo)
          Execute all registered preprocess.
static StringTagger getInstance()
           
static StringTagger getInstance(java.util.Locale locale)
          Deprecated. use instead of StringTagger#getinstance(String senConfig)
static StringTagger getInstance(java.lang.String senConfig)
          Obtain StringTagger instance for with specified configuration.
 boolean hasNext()
          Check StringTagger have more morphemes or not.
 Token next()
          Get next morpheme.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

unknownPos

protected java.lang.String unknownPos
Method Detail

getInstance

public static StringTagger getInstance()
                                throws java.lang.IllegalArgumentException,
                                       java.io.IOException
Throws:
java.lang.IllegalArgumentException
java.io.IOException

getInstance

public static StringTagger getInstance(java.util.Locale locale)
                                throws java.io.IOException,
                                       java.lang.IllegalArgumentException
Deprecated. use instead of StringTagger#getinstance(String senConfig)

Obtain StringTagger instance for specified locale.

Parameters:
locale - Locale to generate morphological analyzer.
Throws:
java.io.IOException
java.lang.IllegalArgumentException

getInstance

public static StringTagger getInstance(java.lang.String senConfig)
                                throws java.io.IOException,
                                       java.lang.IllegalArgumentException
Obtain StringTagger instance for with specified configuration.

Parameters:
senConfig - configuration file for sen.(ex. "SEN_HOME/conf/sen.xml").
Returns:
StringTagger instance. StringTagger is generated for each configuration file. If configuration file is same, reutrn same instance.
Throws:
java.io.IOException
java.lang.IllegalArgumentException

analyze

public Token[] analyze(java.lang.String input)
                throws java.io.IOException
Analyze string.

Parameters:
input - string to analyze.
Returns:
token array which represents morphemes.
Throws:
java.io.IOException

next

public Token next()
Get next morpheme.

Returns:
next token. return null when next token doesn't exist.

hasNext

public boolean hasNext()
Check StringTagger have more morphemes or not.

Returns:
true if StringTagger has more morphemes.

addPostProcessor

public void addPostProcessor(PostProcessor processor)
Add PostProcessor.

Parameters:
processor - PostProcessor

addPreProcessor

public void addPreProcessor(PreProcessor processor)
Add PreProcessor.

Parameters:
processor - PreProcessor

doPreProcess

protected java.lang.String doPreProcess(java.lang.String input,
                                        java.util.Map postProcessInfo)
Execute all registered preprocess.

Parameters:
input - input string
postProcessInfo - information passed to postProcess
Returns:
preprocessed string

doPostProcess

protected Token[] doPostProcess(Token[] tokens,
                                java.util.Map postProcessInfo)
Execute all registered preprocess.

Parameters:
tokens - tokens
postProcessInfo - information passed from preprocess
Returns:
postprocessed tokens