Machine Learning Tools in Natural Language Processing
This tutorial explores the use of SNoW and FEX, two of our core machine learning tools, to solve text processing problems. Specifically, we apply SNoW and FEX to context sensitive spelling and named entity tagging. The tutorial also uses a number of our other standard tools (such as our Part-of-Speech Tagger) and custom scripts to preprocess/postprocess data for each task. Finally, it mentions ways to streamline the process using perl and shell scripts, and SNoW and FEX's server modes.
For each tutorial session below, you will find links to the software and data you need, script files that detail command line usage, and (where appropriate) helper scripts.
NOTE: The text files next to the tool links assume you have downloaded and unzipped/untarred the relevant package. They walk you through installation on the computers in the SC lab we are using for the tutorial, and show a sample run. They are NOT executable: they are intended to give an example for you to follow (IYI, they were generated with the unix 'script' command, hence the suffix '.script'...).
If you have questions about these materials, particularly if you are attending the current tutorial sessions, please contact me at firstname.lastname@example.org -- likewise if you find incorrectly labeled resources, errors, broken links, etc.
Session 1: Context Sensitive Spelling
Session 2: Named Entity Tagging with FEX and SNoW
More details about SNoW, including its many useful tuning parameters and support for inference, can be found in the comprehensive SNoW user manual.
Fex's user manual is included in its distribution tarball. In addition to explaining FEX's scripting language and input formats, this manual gives details of other specialized FEX modes.
The following resources demonstrate FEX's document mode, which is not covered in this tutorial.
Under each session heading are links to slides in powerpoint and pdf formats.
NOTE: some slides will not display correctly in pdf format.
The resources to accompany each session are provided further down the page.
Session 1: Text Processing and Feature Extraction
- Introduction; Preprocessing; Feature Extraction ppt
- Multi-class Classification with SNoW ppt pdf
Session 2: Applying FEX and SNoW to Named Entity Tagging
- Candidate Selection and Feature Extraction: Named Entity Tagging ppt
Session 3: Learning Based Java
- The Basics: ppt pdf
- In the resources section below, see the User's Manual and the README in the "toy context sensitive spelling corrector" for more info.