Lecture Series in Pattern Recognition
题 目（TITLE）：Applying NLP Technologies to the Collection and Analysis of Language Data to Aid Linguistic Research
讲 座 人（SPEAKER）: Dr. Fei Xia，the University of Washington (UW)
主 持 人 (CHAIR)：Prof. Chengqing Zong
时 间 (TIME)：10:00AM, JULY 5 (Thursday), 2012
地 点 (VENUE)：1115 Meeting Room
As a vast amount of language data has become available electronically, linguistics is gradually transforming itself into a discipline where science is often conducted using corpora. In this talk, we review the process of building ODIN, the Online Database of Interlinear Text, a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted to the Web, and it currently holds more than 200,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from more than 10% of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e., the documents from which data was extracted) by language name, language family, and even linguistic constructions. Further, we have sought to enrich the collected data and extract "knowledge" from the enriched content. This work demonstrates the benefits of using natural language processing technology to create resources and tools for linguistic research, allowing linguists to have easy access not only to language data embedded in existing linguistic papers, but also to automatically generated language profiles for hundreds of languages.
Fei Xia is an Associate Professor at the Linguistics Department at the University of Washington (UW) and an adjunct faculty at the Department of Biomedical Informatics and Medical Education at the UW Medical School. Her research covers a wide range of NLP tasks including morphological analysis, part-of-speech tagging, grammar extraction and grammar generation, treebank development, machine translation, information extraction, and bio-NLP. Her current research focuses on building NLP systems that combine linguistic knowledge and machine learning techniques. She is also interested in collecting data and building tools to assist linguistic study. Her work is supported by several grants from NSF, NIH, IARPA, Microsoft, and UW, including the prestigious NSF CAREER Award.
Fei Xia received her Bachelor's degree from Peking University, and Ph.D. from the University of Pennsylvania (UPenn). At UPenn, she led the effort in building the Chinese Penn Treebank, which currently has 1.2 million words and is one of the most commonly used corpora for Chinese NLP. After graduation, she worked at the IBM T. J. Watson Research Center at Yorktown Heights, New York before joining UW.