模式识别国家重点实验室

题    目（TITLE）：Applying NLP Technologies to the Collection and Analysis of Language Data to Aid Linguistic Research
讲座人（SPEAKER）: Dr. Fei Xia，the University of Washington (UW)
主持人 (CHAIR)：Prof. Chengqing Zong
时    间 (TIME)：10:00AM, JULY 5 (Thursday), 2012
地    点 (VENUE)：1115 Meeting Room

报告摘要（ABSTRACT）：

As a vast amount of language data has become available electronically, linguistics is gradually transforming itself into a discipline where science is often conducted using corpora. In this talk, we review the process of building ODIN, the Online Database of Interlinear Text, a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted to the Web, and it currently holds more than 200,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from more than 10% of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e., the documents from which data was extracted) by language name, language family, and even linguistic constructions. Further, we have sought to enrich the collected data and extract "knowledge" from the enriched content. This work demonstrates the benefits of using natural language processing technology to create resources and tools for linguistic research, allowing linguists to have easy access not only to language data embedded in existing linguistic papers, but also to automatically generated language profiles for hundreds of languages.

报告人简介（BIOGRAPHY）：

Fei Xia is an Associate Professor at the Linguistics Department at the University of Washington (UW) and an adjunct faculty at the Department of Biomedical Informatics and Medical Education at the UW Medical School. Her research covers a wide range of NLP tasks including morphological analysis, part-of-speech tagging, grammar extraction and grammar generation, treebank development, machine translation, information extraction, and bio-NLP. Her current research focuses on building NLP systems that combine linguistic knowledge and machine learning techniques. She is also interested in collecting data and building tools to assist linguistic study. Her work is supported by several grants from NSF, NIH, IARPA, Microsoft, and UW, including the prestigious NSF CAREER Award.

Fei Xia received her Bachelor's degree from Peking University, and Ph.D. from the University of Pennsylvania (UPenn). At UPenn, she led the effort in building the Chinese Penn Treebank, which currently has 1.2 million words and is one of the most commonly used corpora for Chinese NLP. After graduation, she worked at the IBM T. J. Watson Research Center at Yorktown Heights, New York before joining UW.

承办单位：模式识别国家重点实验室

友情链接

中科院自动化研究所模式识别国家重点实验室事业单位京ICP备14019135号-3
NLPR, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES