Cognitive Computation Group

Software SNow and FEX Tutorial

Tutorial: Machine Learning Tools in Natural Language Processing

Introduction

This tutorial explores the use of SNoW and FEX, two of our core machine learning tools, to solve text processing problems. Specifically, we apply SNoW and FEX to context sensitive spelling and named entity tagging. The tutorial also uses a number of our other standard tools (such as our Part-of-Speech Tagger) and custom scripts to preprocess/postprocess data for each task. Finally, it mentions ways to streamline the process using perl and shell scripts, and SNoW and FEX's server modes.

Resource

For each tutorial session below, you will find links to the software and data you need, script files that detail command line usage, and (where appropriate) helper scripts.

NOTE: The text files next to the tool links assume you have downloaded and unzipped/untarred the relevant package. They walk you through installation on the computers in the SC lab we are using for the tutorial, and show a sample run. They are NOT executable: they are intended to give an example for you to follow (IYI, they were generated with the unix script command, hence the suffix .script...).

Session 1

Context Sensitive Spelling

sample text
sentence segmentation tool (script: using the sentence splitter)
word splitter (script: using the word splitter)
Part-of-Speech Tagger (script: using the POS-tagger)
FEX (script: using FEX (basic))
preprocessing test materials
SNoW
Context Sensitive Spelling materials
explanation of POS tags

Session 2

Named Entity Tagging with FEX and SNoW

Session 3

Misc. Resources

More details about SNoW, including its many useful tuning parameters and support for inference, can be found in the comprehensive SNoW user manual.

Fex's user manual is included in its distribution tarball. In addition to explaining FEX's scripting language and input formats, this manual gives details of other specialized FEX modes.

The following resources demonstrate FEX's document mode, which is not covered in this tutorial.

If you have questions about these materials, particularly if you are attending the current tutorial sessions, please contact me at mssammon@uiuc.edu -- likewise if you find incorrectly labeled resources, errors, broken links, etc.

Tutorial Slides

Under each session heading are links to slides in powerpoint and pdf formats. NOTE: some slides will not display correctly in pdf format.

The resources to accompany each session are provided further down the page.

Session 1: Text Processing and Feature Extraction
Introduction; Preprocessing; Feature Extraction ppt
Multi-class Classification with SNoW ppt pdf
Session 2: Applying FEX and SNoW to Named Entity Tagging
Candidate Selection and Feature Extraction: Named Entity Tagging ppt
Session 3: Learning Based Java
The Basics: ppt pdf
In the resources section below, see the User's Manual and the README in the "toy context sensitive spelling corrector" for more info.