This POS tagger is substantially the same as our SNoW-based POS tagger, except that this one performs better, outputs a more standardized tag set, and can accept raw, natural language text as input (i.e., it should not be sentence-split or word-split). The output format is the same.
Another difference between this version and the SNoW-based POS tagger is that
LBJ makes this tagger much easier to incorporate into other Java applications.
Simply import the tagger and call it on a
See the online Javadoc documentation.
The tagger's performance can be tested on labeled test data with the following command. See its Javadoc documentation for a description of the input format.
A stand-alone program that takes plain, unannotated text as input is also provided. It accepts a file containing raw, natural language text that has not been sentence-split or word-split as input. Run it with the following command line.
The LBJ part of speech tagger expects that words are represented internally
using the LBJ library's
LBJ2.nlp.seg.Token class. If your LBJ
source code defines a learning classifier that also takes a
as input, you can import the POS tagger and use it like so:
If your Java application uses the
Token class as well, you can
import the POS tagger and use it like so:
The list of tags returned by the
discreteValue(Object) method in
the context shown above can be found in the online Javadoc at