
Presenters
Mark Sammons,
and
Vivek Srikumar
Many of the technologies we rely on in our everyday lives depend on the ability to automatically handle natural language. Search engines determine the relevance of documents with respect to keywords. Spam detectors filter email messages based on their content. Automatic machine translators translate from one natural language to another. Systems such as these all use machine learning to leverage the information in large datasets and improve their performance with experience. In this tutorial, we introduce our suite of state-of-the-art NLP tools, focusing on:
As motivation, consider the following application whose implementation will become achievable by the end of the tutorial. Suppose you have a news feed through which you receive articles from the full spectrum of news sections; world news, politics, health, finance, sports, etc. Perhaps you'd like to filter these articles based on the appearance of people in them who are famous for different reasons. For example, they may be politicians, athletes, or corporate moguls. While a given type of famous person does tend to appear most commonly in a single news section, you'd like to see all news involving those types of people no matter the section. How can your news feed software automatically determine what a given person in the news is famous for?
Part I: Learning Classifiers from Data with LBJ
[
slides: ppt
slides: pdf
]
A classifier is simply a function that takes some object as input
and produces a discrete output, classifying the object into one of a set of
categories. In a traditional programming language, functions such as these
must be hard coded entirely in the syntax of the language. LBJ, on the other
hand, allows the partial specification of a classifier whose definition can
only be completed via interaction with data. When paired with different data,
the same LBJ code results in a different classifier. Specifically, we'll take
a look at the well known "20 Newsgroups" dataset, language identification, and
spam detection.
Part II: Illinois NLP Tools
[
slides: ppt
slides: pdf
]
The Cognitive Computations Group has developed a
suite of state-of-the-art NLP tools, many of which have online
demos so you can try them out even before downloading them. We manage the
application of these tools in experiments and NLP software using a service
called the Curator. Time allowing, we'll begin discussing these tools
during this first lecture.
|
Part III: Incorporating NLP tools and Machine Learning into Applications
[slides: ppt
slides: pdf
] |
|