The files have one of three extensions:
Each line is a .feat file is an Example:. A list of active features, separated by a , and ends with a :. The class label is considered one of the features. It takes values 0 -- k-1, (where k is the number possible labels, the size of the confusion set). The rest of the features are indexed with numbers above k. The feature indices are sorted and therefore the class label appears as the first in this list.
Filenames with suffix 20 are test files, those with suffix 80 are the training files (correspond to 20%, 80% of the data, respectively).
Download the entire corpus here.