The Curator
The Curator is a system that acts as a central server in providing annotations for text. It is responsible for requesting annotations from multiple natural language processing servers, caching and storing previous annotations and refreshing stale annotations. The Curator provides a centralized resource which requests annotations for natural language text. The attached powerpoint presentation ( http://cogcomp.cs.illinois.edu/curator/doc/curator.zip) gives an overview.
Confused? Check out the web demo or keep reading.
Interfaces
The Curator architecture defines multiple data types and service interfaces for creating new annotation servers and communicating with the Curator. The interfaces are defined using Apache Thrift which provides a software stack and code generation for cross-language deployment. This allows annotation servers and Curator clients to be written in multiple language. Currently Thrift supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
The Curator interface documentation describes the interfaces and data structures used within the Curator architecture and the CuratorDataStructures page on this wiki has visualizations of the data structures.
Annotators
The Curator package comes bundled with annotators capable of performing the following annotations:
- Tokenization and Sentence Splitting (via Illinois NLP tools)
- Part-of-speech tags (via Illinois POS Tagger)
- Chunk (shallow parse) analysis (via Illinois Chunker)
- Named Entities (via Illinois Named Entity Recognizer)
- Coreference (via Illinois Coreference package)
- Parse trees (via Stanford Parser and Charniak Parser)
- Dependency trees (via Stanford Parser)
- Semantic Role Labels for verbs and nouns (via Illinois SRL)
- Reference Entities (via Illinois Wikifier)
The Curator web demo also demonstrates other annotators that have been wrapped in Curator interfaces:
- Dependency trees (via Easy-First parser)
CuratorAnnotator describes the process of creating a new annotation service.
Monitoring and Managing Curator Processes
See the relevant CCG wiki page for details: https://agora.cs.illinois.edu/display/ccg/Curator
Download
Download the Curator:
Browse the Curator source:
Examples
Consult the following examples to get a feeling for how to interact with a Curator server and see the kind of annotations already available within the Curator architecture:
- CuratorDemo.java provides a documented walkthrough example of creating a client to the Curator in Java.
- The web demo acts as a reference client implemented purely in PHP.
Documentation
See the included INSTALL files included in the tarball for documentation on getting the Curator running. Also check out the documentation for each annotation server in the docs directory.
CuratorDataStructures provides useful visualizations of the data structures used within the Curator architecture including an example of a Part-of-Speech Labeling and Parse Tree Tree.
CuratorDemo.java is a walkthrough guide to calling and interacting with the Curator server.
CuratorAnnotator describes the process of writing a new annotation service within the Curator architecture.
Mailing List
We have a illinois-ml-nlp-users mailing list for questions and discussions regarding all Illinois natural language processing and machine learning tools produced by the Cognitive Computation Group:
- Join via the illinois-ml-nlp-users website
- Join by emailing illinois-ml-nlp-users-join@lists.uiuc.edu
