Training Paradigms for Correcting Errors in Grammar and Usage
Authors:
Alla Rozovskaya and Dan Roth
Abstract:
This paper proposes a novel approach to the
problem of training classifiers to detect and
correct grammar and usage errors in text by
selectively introducing mistakes into the training
data. When training a classifier, we would
like the distribution of examples seen in training
to be as similar as possible to the one seen
in testing. In error correction problems, such
as correcting mistakes made by second language
learners, a system is generally trained
on correct data, since annotating data for training
is expensive. Error generation methods
avoid expensive data annotation and create
training data that resemble non-native data
with errors.
We apply error generation methods and train
classifiers for detecting and correcting article
errors in essays written by non-native English
speakers; we show that training on data
that contain errors produces higher accuracy
when compared to a system that is trained on
clean native data. We propose several training
paradigms with error generation and show
that each such paradigm is superior to training
a classifier on native data. We also show that
the most successful error generation methods
are those that use knowledge about the article
distribution and error patterns observed in
non-native text.
Citation:
A. Rozovskaya and D. Roth,
Training Paradigms for Correcting Errors in Grammar and Usage. NAACL (2010) Bibitem:
@inproceedings{RozovskayaRo10,
author = {A. Rozovskaya and D. Roth},
title = {Training Paradigms for Correcting Errors in Grammar and Usage},
booktitle = {NAACL},
month = {6},
year = {2010},
url = " http://cogcomp.cs.illinois.edu/papers/RozovskayaRo10.pdf",
funding = {EDU},
projects = {CSTC},
comment = {Text correction; Error Detection and Correction; ESL Error Detection; ESL Proofing Tools; Errors in Article Usage; training classifiers for detecting and correcting mistakes by selectively introducing mistakes into the training data},
}