TY - GEN
T1 - CogcompnLP
T2 - 11th International Conference on Language Resources and Evaluation, LREC 2018
AU - Khashabi, Daniel
AU - Sammons, Mark
AU - Zhou, Ben
AU - Redman, Tom
AU - Christodoulopoulos, Christos
AU - Srikumar, Vivek
AU - Rizzolo, Nicholas
AU - Luo, Lev Ratinov Guanheng
AU - Do, Quang
AU - Tsai, Chen Tse
AU - Roy, Subhro
AU - Mayhew, Stephen
AU - Feng, Zhili
AU - Wieting, John
AU - Yu, Xiaodong
AU - Song, Yangqiu
AU - Gupta, Shashank
AU - Upadhyay, Shyam
AU - Arivazhagan, Naveen
AU - Ning, Qiang
AU - Ling, Shaoshi
AU - Roth, Dan
N1 - Funding Information:
The authors would like to thank all the other contributors to the project. This material is partly based on research sponsored by DARPA under agreement number FA8750-13-2-0008. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. This work was also supported by Contract HR0011-15-2-0025 with the US Defense Advanced Research Projects Agency (DARPA). Approved for Public Release, Distribution Unlimited. This work was partly funded by a grant from the Allen Institute for Artificial Intelligence (allenai.org); by Google; by NSF grant BCS-1348522; and by NIH grant R01-HD054448. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any research sponsors listed above.
Funding Information:
The authors would like to thank all the other contributors to the project. This material is partly based on research sponsored by DARPA under agreement number FA8750-13-2-0008. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. This work was also supported by Contract HR0011-15-2-0025 with the US Defense Advanced Research Projects Agency (DARPA). Approved for Public Release, Distribution Unlimited. This work was partly funded by a grant from the Allen Institute for Artificial Intelligence (allenai.org); by Google; by NSF grant BCS-1348522; and by NIH grant R01-HD054448. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily rep- resenting the official policies or endorsements, either expressed or implied, of any research sponsors listed above.
Publisher Copyright:
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Implementing a Natural Language Processing (NLP) system requires considerable engineering effort: creating data-structures to represent language constructs; reading corpora annotations into these data-structures; applying off-the-shelf NLP tools to augment the text representation; extracting features and training machine learning components; conducting experiments and computing performance statistics; and creating the end-user application that integrates the implemented components. While there are several widely used NLP libraries, each provides only partial coverage of these various tasks. We present our library COGCOMPNLP which simplifies the process of design and development of NLP applications by providing modules to address different challenges: we provide a corpus-reader module that supports popular corpora in the NLP community, a module for various low-level data-structures and operations (such as search over text), a module for feature extraction, and an extensive suite of annotation modules for a wide range of semantic and syntactic tasks. These annotation modules are all integrated in a single system, PIPELINE, which allows users to easily use the annotators with simple direct calls using any JVM-based language, or over a network. The sister project COGCOMPNLPY enables users to access the annotators with a Python interface. We give a detailed account of our system's structure and usage, and where possible, compare it with other established NLP frameworks. We report on the performance, including time and memory statistics, of each component on a selection of well-established datasets. Our system is publicly available for research use and external contributions, at: http://github.com/CogComp/cogcomp-nlp.
AB - Implementing a Natural Language Processing (NLP) system requires considerable engineering effort: creating data-structures to represent language constructs; reading corpora annotations into these data-structures; applying off-the-shelf NLP tools to augment the text representation; extracting features and training machine learning components; conducting experiments and computing performance statistics; and creating the end-user application that integrates the implemented components. While there are several widely used NLP libraries, each provides only partial coverage of these various tasks. We present our library COGCOMPNLP which simplifies the process of design and development of NLP applications by providing modules to address different challenges: we provide a corpus-reader module that supports popular corpora in the NLP community, a module for various low-level data-structures and operations (such as search over text), a module for feature extraction, and an extensive suite of annotation modules for a wide range of semantic and syntactic tasks. These annotation modules are all integrated in a single system, PIPELINE, which allows users to easily use the annotators with simple direct calls using any JVM-based language, or over a network. The sister project COGCOMPNLPY enables users to access the annotators with a Python interface. We give a detailed account of our system's structure and usage, and where possible, compare it with other established NLP frameworks. We report on the performance, including time and memory statistics, of each component on a selection of well-established datasets. Our system is publicly available for research use and external contributions, at: http://github.com/CogComp/cogcomp-nlp.
UR - http://www.scopus.com/inward/record.url?scp=85059881208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059881208&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85059881208
T3 - LREC 2018 - 11th International Conference on Language Resources and Evaluation
SP - 541
EP - 549
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Hasida, Koiti
A2 - Mazo, Helene
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Tokunaga, Takenobu
PB - European Language Resources Association (ELRA)
Y2 - 7 May 2018 through 12 May 2018
ER -