TY - GEN
T1 - Grammatical error correction
T2 - 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
AU - Rozovskaya, Alla
AU - Roth, Dan
N1 - Publisher Copyright:
© 2016 Association for Computational Linguistics.
PY - 2016
Y1 - 2016
N2 - We focus on two leading state-of-the-art approaches to grammatical error correction - machine learning classification and machine translation. Based on the comparative study of the two learning frameworks and through error analysis of the output of the state-of-the-art systems, we identify key strengths and weaknesses of each of these approaches and demonstrate their complementarity. In particular, the machine translation method learns from parallel data without requiring further linguistic input and is better at correcting complex mistakes. The classification approach possesses other desirable characteristics, such as the ability to easily generalize beyond what was seen in training, the ability to train without human-annotated data, and the flexibility to adjust knowledge sources for individual error types. Based on this analysis, we develop an algorithmic approach that combines the strengths of both methods. We present several systems based on resources used in previous work with a relative improvement of over 20% (and 7.4 F score points) over the previous state-of-the-art.
AB - We focus on two leading state-of-the-art approaches to grammatical error correction - machine learning classification and machine translation. Based on the comparative study of the two learning frameworks and through error analysis of the output of the state-of-the-art systems, we identify key strengths and weaknesses of each of these approaches and demonstrate their complementarity. In particular, the machine translation method learns from parallel data without requiring further linguistic input and is better at correcting complex mistakes. The classification approach possesses other desirable characteristics, such as the ability to easily generalize beyond what was seen in training, the ability to train without human-annotated data, and the flexibility to adjust knowledge sources for individual error types. Based on this analysis, we develop an algorithmic approach that combines the strengths of both methods. We present several systems based on resources used in previous work with a relative improvement of over 20% (and 7.4 F score points) over the previous state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85011967176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85011967176&partnerID=8YFLogxK
U2 - 10.18653/v1/p16-1208
DO - 10.18653/v1/p16-1208
M3 - Conference contribution
AN - SCOPUS:85011967176
T3 - 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
SP - 2205
EP - 2215
BT - 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
PB - Association for Computational Linguistics (ACL)
Y2 - 7 August 2016 through 12 August 2016
ER -