
This POS tagger is substantially the same as our SNoW-based POS tagger, except that this one performs better, outputs a more standardized tag set, and can accept raw, natural language text as input (i.e., it should not be sentence-split or word-split). The output format is the same.
Another difference between this version and the SNoW-based POS tagger is that
LBJ makes this tagger much easier to incorporate into other Java applications.
Simply import the tagger and call it on a LBJ2.nlp.seg.Token
object.
See the online Javadoc documentation.
The tagger's performance can be tested on labeled test data with the following command. See its Javadoc documentation for a description of the input format.
|
A stand-alone program that takes plain, unannotated text as input is also provided. It accepts a file containing raw, natural language text that has not been sentence-split or word-split as input. Run it with the following command line.
|
The LBJ part of speech tagger expects that words are represented internally
using the LBJ library's LBJ2.nlp.seg.Token class. If your LBJ
source code defines a learning classifier that also takes a Token
as input, you can import the POS tagger and use it like so:
|
If your Java application uses the Token class as well, you can
import the POS tagger and use it like so:
|
The list of tags returned by the discreteValue(Object) method in
the context shown above can be found in the online Javadoc at
http://l2r.cs.uiuc.edu/~cogcomp/software/LBJ2/library/LBJ2/nlp/POS.html#tokens