The Curator

The Curator is a system that acts as a central server in providing annotations for text. It is responsible for requesting annotations from multiple natural language processing servers, caching and storing previous annotations and refreshing stale annotations. The Curator provides a centralized resource which requests annotations for natural language text. The attached powerpoint presentation gives an overview.

Confused? Check out the web demo (works only on UIUC campus) or keep reading.

Interfaces

The Curator architecture defines multiple data types and service interfaces for creating new annotation servers and communicating with the Curator. The interfaces are defined using Apache Thrift which provides a software stack and code generation for cross-language deployment. This allows annotation servers and Curator clients to be written in multiple language. Currently Thrift supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.

The Curator interface documentation describes the interfaces and data structures used within the Curator architecture and the CuratorDataStructures page on this wiki has visualizations of the data structures. The Javadoc for the Curator data structures can be found at http://cogcomp.org/curator/javadoc.

Annotators

The Curator package comes bundled with annotators capable of performing the

following annotations:

  • Tokenization and Sentence Splitting (via Illinois NLP tools)

  • Part-of-speech tags (via Illinois POS Tagger)

  • Chunk (shallow parse) analysis (via Illinois Chunker)

  • Named Entities (via Illinois Named Entity Recognizer)

  • Coreference (via Illinois Coreference package)

  • Parse trees (via Stanford Parser and Charniak Parser)

  • Dependency trees (via Stanford Parser)

  • Semantic Role Labels for verbs and nouns (via Illinois SRL)

  • Reference Entities (via Illinois Wikifier)

  • The Curator web demo also demonstrates other annotators that have been wrapped in Curator interfaces:

  • Dependency trees (via Easy-First parser)

CuratorAnnotator describes the process of creating a new annotation service.

Monitoring and Managing Curator Processes (CCG use only)

See the relevant CCG wiki page for details.

Download

Download the Curator.

Examples

Consult the following examples to get a feeling for how to interact with a Curator server and see the kind of annotations already available within the Curator architecture:

  • CuratorDemo.java provides a documented walkthrough example of creating a client to the Curator in Java.

  • The web demo acts as a reference client implemented purely in PHP.

Documentation

See the included INSTALL files included in the tarball for documentation on getting the Curator running. Also check out the documentation for each annotation server in the docs directory.

This file has instructions for installing Thrift v0.8.0 for Linux.

CuratorDataStructures provides useful visualizations of the data structures used within the Curator architecture including an example of a Part-of-Speech Labeling and Parse Tree Tree.

CuratorDemo.java is a walkthrough guide to calling and interacting with the Curator server.

CuratorAnnotator describes the process of writing a new annotation service within the Curator architecture.

A presentation that explains the pressures that led to the creation of Curator, and a high-level overview of Curator itself, can be found here: UIUC_NLP_Curator.zip.

Mailing List

We have a illinois-ml-nlp-users mailing list for questions and discussions regarding all Illinois natural language processing and machine learning tools produced by the Cognitive Computation Group:

  • Join via the website

  • Join by emailing illinois-ml-nlp-users-join@lists.uiuc.edu