Illinois Part of Speech Tagger


[ Download | User Guide | Key Publication | Questions/Comments ]

If you wish to cite this work, please use the following.

D. Roth and D. Zelenko, Part of Speech Tagging Using a Network of Linear Separators. Coling-Acl, The 17th International Conference on Computational Linguistics  (1998) pp. 1136--1142

This POS tagger is substantially the same as our SNoW-based POS tagger, except that this one performs better, outputs a more standardized tag set, and can accept raw, natural language text as input (i.e., it should not be sentence-split or word-split). The output format is the same.

Another difference between this version and the SNoW-based POS tagger is that LBJava makes this tagger much easier to incorporate into other Java applications. Simply import the tagger and call it on a LBJ2.nlp.seg.Token object.

See the online Javadoc documentation.

Using the Part of Speech Tagger


The tagger's performance can be tested on labeled test data with the following command. See its Javadoc documentation for a description of the input format.

java edu.illinois.cs.cogcomp.lbj.pos.TestPOS <test data>


A stand-alone program that takes plain, unannotated text as input is also provided. It accepts a file containing raw, natural language text that has not been sentence-split or word-split as input. Run it with the following command line.

java edu.illinois.cs.cogcomp.lbj.pos.POSTagPlain <input file>


The LBJava part of speech tagger expects that words are represented internally using the LBJava library's LBJ2.nlp.seg.Token class. If your LBJ source code defines a learning classifier that also takes a Token as input, you can import the POS tagger and use it like so:

// Begin Foo.lbj

import edu.illinois.cs.cogcomp.lbj.pos.POSTagger;
import LBJ2.nlp.seg.Token;

discrete FooClassifier(Token w) <-
learn FooLabeler
  using Feature1, Feature2, POSTagger

If your Java application uses the Token class as well, you can import the POS tagger and use it like so:

// Begin

import edu.illinois.cs.cogcomp.lbj.pos.POSTagger;
import LBJ2.nlp.seg.Token;
import LBJ2.nlp.SentenceSplitter;
import LBJ2.nlp.WordSplitter;
import LBJ2.nlp.seg.PlainToTokenParser;
import LBJ2.parse.ChildrenFromVectors;

public class Bar
  void myMethod(String plainTextFile)
    POSTagger tagger = new POSTagger();
    PlainToTokenParser parser =
      new PlainToTokenParser(
        new WordSplitter(
          new SentenceSplitter(plainTextFile)));
    Token w = (Token);
    String tag = tagger.discreteValue(w);

The list of tags returned by the discreteValue(Object) method in the context shown above can be found in the online Javadoc at