Tutorial:
Machine Learning Tools in Natural Language Processing

Introduction

This tutorial explores the use of SNoW and FEX, two of our core machine learning tools, to solve text processing problems. Specifically, we apply SNoW and FEX to context sensitive spelling and named entity tagging. The tutorial also uses a number of our other standard tools (such as our Part-of-Speech Tagger) and custom scripts to preprocess/postprocess data for each task. Finally, it mentions ways to streamline the process using perl and shell scripts, and SNoW and FEX's server modes.



Resource

For each tutorial session below, you will find links to the software and data you need, script files that detail command line usage, and (where appropriate) helper scripts.

NOTE: The text files next to the tool links assume you have downloaded and unzipped/untarred the relevant package. They walk you through installation on the computers in the SC lab we are using for the tutorial, and show a sample run. They are NOT executable: they are intended to give an example for you to follow (IYI, they were generated with the unix 'script' command, hence the suffix '.script'...).


If you have questions about these materials, particularly if you are attending the current tutorial sessions, please contact me at mssammon@uiuc.edu -- likewise if you find incorrectly labeled resources, errors, broken links, etc.



Tutorial Slides

Under each session heading are links to slides in powerpoint and pdf formats. 
NOTE: some slides will not display correctly in pdf format.

The resources to accompany each session are provided further down the page.

  • Session 1: Text Processing and Feature Extraction

    • Introduction; Preprocessing; Feature Extraction ppt
    • Multi-class Classification with SNoW ppt pdf

  • Session 2: Applying FEX and SNoW to Named Entity Tagging

    • Candidate Selection and Feature Extraction: Named Entity Tagging ppt
  • Session 3: Learning Based Java

    • The Basics: ppt pdf
    • In the resources section below, see the User's Manual and the README in the "toy context sensitive spelling corrector" for more info.