Context Sensitive Verb Paraphrasing
[Run Demo]
Lexical paraphrasing (replacing one word with another) is an inherently context sensitive problem because a word's meaning depends on context. Most paraphrasing work finds patterns and templates that can replace other patterns or templates in some context, but we are attempting to make decisions for a specific context. We have developed a global classifier that takes a verb v and its context (sentence that v appears in, along with a candidate verb u, and determines whether u can replace v in the given sentence while maintaining the original meaning. The classifier makes its decision by finding other contexts that both v and u appear in, and seeing how similar these are to the given context of v. We train the classifier without supervision by utilizing a large set of local classifiers each trained to locate paraphrases of a single word. These local classifiers then generate labeled data for the global classifier.
Context-Sensitive Spelling Correction
[Run Demo]
Standard errors resulting in valid words can not be caught by a standard dictionary spell checker, and account for some 25% of all spelling errors.
Examples include: "please feel this form"; "I'd like a peace of cake" etc. Context sensitive spelling correction has been shown to be extremely effective in learning to correct these errors, performing with an accuracy level greater than 95%. This demo allows used to input text as if they are using their own editor. The program will then suggest corrections for any errors it finds.
Coreference Resolution
[Run Demo]
A given entity - representing a person, a location, or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language and supporting intelligent access to textual information requires identifying whether different entity mentions are actually referencing the same entity. The Coreference Resolution Demo processes unannotated text, detecting mentions of entities and showing which mentions are coreferential.
Dataless Classification
[Run Demo]
Dataless Classification is a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts.
This demo shows this idea in action, allowing the user to enter arbitrary text and class labels. Without any training, the text is classified into the labels.
Event Annotation Tool Plus
[Run Demo]
The Event Annotation Tool Plus is a flexible annotation interface for tagging natural language text with complex, structured annotations.
LLM (Lexical Level Matching)
[Run Demo]
LLM computes a similarity measure over pairs of text spans.
Multilingual Named Entity Discovery
[Run Demo]
A basic sub-task of many natural language processing problems is the identification of words or phrases of specific types (e.g. locations, people, and organizations) in text, and is commonly called Named Entity Recognition (NER). Most successful approaches to NER require large amounts of text with Named Entities tagged by a human annotator. However, in many (especially less common) languages such resources do not exist. We demonstrate a method to automatically generate such resources from multilingual corpora (such as multilingual news streams).
Named Entity Recognition
[Run Demo]
Named entity recognition refers to the task of identifying what phrases in text represent names of People, what represent names of Locations, Organizations, etc. This is a fundamental task in information extraction since it allows some level of abstraction that is required to support the level of interaction people are comfortable with. This is a context sensitive task, as is shown in: Jakob Washington left to Denver to meet with John Denver who works for Washington Mutual.
Named Entity Recognizer (extended entity type set)
[Run Demo]
The Illinois Extended Named Entity Recognizer labels eighteen predefined types of entities in plain text.
Named Entity Similarity
[Run Demo]
In textual inference, it is often necessary to determine when two proper nouns refer to the same entity; for example, "Bill Gates" could refer to the same entity as "Mr. Gates", but not "Mrs. Gates". Our Named Entity Similarity Metric applies sets of rules to two strings to determine whether they are likely to refer to the same underlying entity. It handles different types of entity -- people, locations, and organizations -- using appropriate resources (.e.g, acronyms for companies).
Number Quantization
[Run Demo]
Number Quantization refers to the task of recognizing the values of numbers written in text. This tool recognizes numerical entities whether they are written as words or numerals, and can support comparison of commensurate numerical types (e.g. dates).
Part of Speech Tagging
[Run Demo]
The importance of assigning each word in a sentence the part of speech (POS) that it assumes in that sentence stems from the fact that identifying POS is one of the early stages in the process performed by various natural language related processes such as speech recognition, translation, and information retrieval and extraction. See how it's done!
Relation Identification
[Run Demo]
We demonstrate a novel and robust approach for the problem of identifying taxonomic relations between pairs of concepts. We focus on identifying relations that are essential to supporting textual inference: determining whether two concepts hold an ancestor-child relation or whether they are siblings. Our method makes use of Wikipedia as a main source for background knowledge.
Semantic Role Labeling
[Run Demo]
Beyond the syntactical analysis of natural language sentences is the extraction of its semantic information. Semantic role labeling is one of such task which identifies the verb and argument structure in natural language sentences, and is an important task toward natural language understanding.
Shallow Parsing
[Run Demo]
Enabling a machine to respond to natural language input demands that the machine is equipped with the capacity to identify syntactical phrases in sentences. It is virtually impossible to manually write a comprehensive set of rules the accurately defines the appropriate solutioin to every task of the this nature. However, the availability of annotated corpora (collections of text) and robust machine learning techniques make it possible to emply machines to learn this task from training examples.
Temporal Extraction and Comparison
[Run Demo]
The Illinois TimeSim software processes document text, extracting and canonizing strings representing temporal expressions. It generates an interval-based representation of each individual expression, and determines their order relative to a user-specified reference time.
Text Analysis
[Run Demo]
This analysis tool annotates different syntactic and semantic information, including syntactic parse trees, named entities, semantic roles and nominal relations on raw text.
Transliteration
[Run Demo]
The Transliteration demo can generate transliterations between Chinese, English, Russian, and Hebrew.
Web Page Text Extractor
[Run Demo]
The Web Page Text Extractor extracts plain text from web documents.
Wiki-based Transliteration
[Run Demo]
This software transliterates Russian words (single words only, not phrases) into English words, using Wikipedia data.
Wikifier
[Run Demo]
The Wikifier identifies important entities and concepts in text, disambiguates them and links them to Wikipedia. Wikification is an important step in helping to facilitate Information Access, in knowledge acquisition from text and in helping to inject background knowledge into NLP applications. The main decisions the Wikifier must make are: (1) What expressions to link to Wikipedia. (2) Disambiguating the ambiguous expressions and entities. This Wikification demo uses four types of features: (a) String matching and prevalence of entities in Wikipedia. (b) Lexical similarity between the input document and the Wikipedia pages. (c) "Semantic Similarity" between the ESA summary of the input document and Wikipedia pages. (d) How likely is a set of Wikipedia pages to be linked from a single document (we get this statistic by looking at the linkage patterns in Wikipedia).
Wikifier Database
[Run Demo]
A Wikifier Database interface that allows user to query the database by keywords, wikipedia url, corpus ID and document ID. The results include special mentions, wikipeida url, corpus ID, document ID and sentence/document level context of them (verbs and other mentions).
Word Similarity
[Run Demo]
A word similarity metric using WordNet and other resources.