NLP Tools

The tools on this page are useful for a variety of text processing tasks, such as converting raw text to a form suitable for FEX, calling CogComp servers to tag text, etc. They are provided here as a convenience for developers and as a courtesy to users of our tools.

These tools are under development and are provided as is; they work on the systems on which they were created, but we make no guarantees that they will work on others. We also will not accept responsibility for any problems that may arise from using these tools.

Bibfile Sorter  

This tool sorts bibitems in a bibfile.


CAT_DATA and its Pre-Processing Tools  

This package consists of the data set and the pre-processing software used in TJRH10 (Coling-2010)


cogcomp c++ library  

A collection of useful general-purpose c++ functions.


Collins Parser / FEX Translators  

These tools convert column format data to the format required by Collins' Parser, and the output of the Collins' Parser to a column format similar to that used by FEX.


Error annotation tool  

Use this program to facilitate error-tagging of (ESL) text. We provide an error annotation scheme that focuses on most common types of ESL errors. The scheme can be adjusted to your own annotation needs.


F1 calculator (Shallow Parser)  

These scripts calculate precision, recall and F1 values for bracketed data (Shallow Parser format).


FEX input preprocessor: chunks to columns  

This tool takes text output from the shallow parser (chunker) and converts it to column format.


FEX lexicon pruner  

The lexicon pruner removes redundant entries in FEX's lexicon file.


HTML Tag Stripper  

This tool retrieves a page in html format and extracts the text content (by stripping the html tags).


Preprocessor  

This tool takes plain text and adds POS, Shallow Parse and Named Entity tags.


Sentence Segmentation tool  

This sentence segmentation tool reads plain text and rewrites it with one sentence per line.


Snow Statistics Summarizer  

This tool summarizes SNoW output statistics for each label in a given task.


SNoW Tuner  

This script will try every combination of the parameter settings you give it, training a SNoW network and evaluating it on a test set. The parameters that gave the best performance are reported.


stringsim library: string similarity functions  

A c++ library of string similarity functions.


TrecWN library: a WordNet interface  

A library of c++ functions that allow you to interact with WordNet.


Verb Tense Changer  

This tool changes the tense of a verb, e.g. from 3rd person singular to present participle.


Word Splitter  

The word splitter is a segmentation script that reads plain text (one sentence per line) and outputs the words with spaces between every word and punctuation mark (this format is need by tools such as the POS-tagger).