Illinois NLP Curator

(1230 downloads)

[ Download | User Guide | Key Publication | Questions/Comments ]

If you wish to cite this work, please use the following.

J. Clarke and V. Srikumar and M. Sammons and D. Roth, An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). LREC  (2012)

The Curator is a NLP component management system designed to simplify the use and aggregation of NLP components such as part of speech taggers, named entity taggers, semantic role labelers, and syntactic parsers for use by other applications -- including satisfying dependencies of these same NLP components. Curator was developed in a Linux environment; to use it on a Windows system presently requires it to be installed on a virtual machine. Group members and other people at the University of Illinois with an Active Directory login can try out the web demo.

Curator's installation process installs a number of NLP components that are then available for use via the Curator's main service:

  • Illinois sentence splitter/tokenizer
  • Illinois Part of Speech tagger
  • Illinois Chunker
  • Illinois Named Entity Recognizer (extended and original tag sets)
  • Charniak Syntactic Parser
  • Stanford Parser
  • Illinois Semantic Role Labeler
  • Illinois Wikifier

Some of these components require significant amounts of memory; one motivation for creating the Curator was the need to distribute such components across multiple machines. However, a server with 32G of RAM should be able to run all components together.

More information about the Curator and its use can be found in the tutorials, and in this presentation: http://cogcomp.cs.illinois.edu/trac/curator.php.

If you program in Java, you can use the Illinois Edison library (http://cogcomp.cs.illinois.edu/page/software_view/Edison), which provides a simple interface to the Curator, together with a range of NLP-related data structures and functionalities, including feature extraction.

Participants:

Publications: