Hover over a software package to view its description. Click the title to view more details.
This project demonstrate how to perform dataless hierarchical text classification.
Learning Based Java (LBJava) is a modeling language that expedites the development of systems with one or more learning components. In an LBJ model, simple learned components are modeled conditionally, and their initial predictions are then combined via constrained optimization, yielding an expressive, globally coherent set of final predictions.
SNoW is a learning architecture that is tailored for learning in the presence of a very large number of information sources (features). SNoW learns a network of linear functions.
FEX is a feature extraction package used to provide input to machine learning algorithms. FEX can be used to generate features from structured text or other relational data.
Edison is a java library for representing different NLP annotations (views) over text in the form of graphs over constituents. It provides easy-to-use accessors for different types of views and facilitates feature extraction.
JLIS (pronounced as "jealous") is a multi-purpose structural learning library. JLIS-multiclass package supports performing cost-sensitive multiclass classification. JLIS-reranking package supports using the performance measure (e.g. F1) to do "weighted" reranking.
This software is aimed at learning a linear classifier when data cannot fit in memory on a single machine
The Curator is a system that acts as a central server in providing annotations for text. It is responsible for requesting annotations from multiple natural language processing servers, caching and storing previous annotations and refreshing stale annotations. The Curator provides a centralized resource which requests annotations for natural language text.
This is a state of the art NE tagger that tags plain text with named entitites. The newest version can tag with the "classic" label set (people / organizations / locations / miscellaneous) or a larger (18-label) set defined by the OntoNotes corpus. It uses gazetteers extracted from Wikipedia, word class model derived from unlabeled text and expressive non-local features. The best performance is 90.8 F1 on the CoNLL03 shared task data.
This system identifies "important expressions" in the input text and cross-links them to Wikipedia
A Quantity Detector and Standardizer
The Illinois NLP Pipeline is a stand-alone package that integrates tokenization, POS tagging, Chunking, and NER tagging. It provides a programmatic interface via Curator data structures.
The Illinois Lemmatizer combines WordNet-based lemmatizers with some additional heuristics, and can populate Views of Edison's TextAnnotation and Curator's Record data structures.
IllinoisCloudNLP is software framework that allows users to run a set of Cognitive Computation Group's NLP tools on Amazon's cloud computing framework. The software makes it straightforward for experts and non-experts to process large corpora with state-of-the-art NLP tools quickly, on demand, at a reasonable cost, and with minimal local hardware requirements.
The Semantic Role Labeler identifies the verb-argument structure in a sentence. Specifically, it labels the sentence with Propbank-style labels. This tool is a machine-learning based system that uses SNoW and FEX for local classification decisions, and Integer Linear Programming to make global inferences about sets of these local decisions.
This is an implementation of our SNoW-based POS tagger for use with LBJ.
A classifier that partitions plain text into sequences of semantically related words, indicating a shallow (i.e., non-hierarchical) phrase structure.
A Coreference Resolver, based on LBJ, trained on the ACE 2004 corpus.
The Illinois Temporal Extractor processes documents and extracts temporal expressions, relating them to each other and optionally to a reference date.
This software gives an API to perform Dataless Classification
Given a research area as a query, this package returns names of experts in this area.
A set of utilities to simplify interactions with Curator (or the stand-alone NLP pipeline)
Extracts plain text, interwiki links and categories from compressed Wikipedia XML dump
This software was used in the research described in the paper
This package implements a Lifted First-Order Probabilistic Inference algorithm.
An implementation of CoRanker, an algorithm for Named Entity discovery from multilingual comparable corpora.
pySNoW is a minimal python interface to the SNoW - Sparse Network of Winnows learning architecture. It is meant to be faithful to the original command line interface and provides access to the train, test, evaluate, interactive and server modes directly from python. pySNoW requires SNoW version 3.2.0.
An implementation of an unsupervised learning algorithm for rank aggregation with distance-based models.
Implementation of Maximum Subsequence Segmentation for extracting article text (or other blocks of content) from HTML documents
LLM is an asymmetric similarity measure between spans of text.
WNSim provides a WordNet-based similarity metric that computes a symmetric similarity score between a pair of words or phrases. It is coded in c++, but runs as an xmlrpc service, so can be used by applications written in other languages.
This is a Java version of the C++ WNSim tool, which computes a similarity score based on relative positions of the compared terms within the WordNet hierarchy.
NESim provides a similarity metric that computes a similarity score between a pair of Named Entities (People, Organizations, Locations, and Misc). It is coded in java, but runs as an xmlrpc service, so can be used by applications written in other languages.