Speech and Language Information Processing Group

Speech and language technology is one of research directions in the National Laboratory of Pattern Recognition (NLPR). Now the research group focuses on the fundamental problems and key techniques in the area of human language technology and tries to make great contributions on the theoretical modeling, application system developing, and large scale corpus base building as well. The interests include the natural language processing related aspects, such as (1) Machine translation, (2) Information extraction, and text mining, and speech and affective computing related aspects, such as (1) Speech recognition, (2)Speech synthesis, and (3)Computational auditory scene analysis and so on.

On natural language processing(visit Natural Language Processing Group)
In this sub-direction the following techniques are focused: Chinese word segmentation, natural language parsing, discourse parsing, machine translation, information extraction and question-answering and so on. In the recent years we have taken charge of many projects supported by China government, such as the projects supported by the natural science foundation of China, the national high technology research and development program of China (“863 Program”) and the national basic research program of China (“973” Program). Also, we have good ties with some industries, such as Baidu, Google, Tencent, and Huawei. We have published many high quality papers in some top rank journals such as Computational Linguistics, IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE Transactions on Knowledge and Data Engineering and Transaction of ACL etc., and in some top conferences such as AAAI, IJCAI, ACL, SIGIR and WWW. We have won the best paper awards of COLING’2014, PACLIC’2009, and NLPCC’2012. The multi-lingual machine translation system translates between Chinese and more than 15 languages, including Tibetan, Uyghur and Mongolian of Chinese minority languages. The system has been applied in some specific domains. The recommendation system was ranked at the second position in the competition of KDD-CUP’2011.

Figure1 The languages that our multi-lingual machine translation system translates


Figure2 Knowledge Mining and Deep Question Answering


On Human Machine Speech Interaction Group(visit Human Machine Speech Interaction Group)

The Multimodal Human Computer Dialog


The main research topics of Human machine Speech Interaction Group include: speech synthesis and recognition, natural spoken language analysis and understanding, human machine interaction, affective computing, audio and video content understanding. There are 12 people in the research group, over 10 doctoral students and master students, 1 academic adviser, and several internship students.
The members of group publishedover 190 papers on IEEE transaction on ASLP, Speech Communication, ACM multimedia, ICASSP, InterSpeech, ICIP, ICCV and other important international journals or conferences. The group applied 28 invention patents, and 15 invention patents were authorized, including one international patent. The research results of the "Chinese speech synthesis system" get the first score in 2007 TC-STAR speech synthesis evaluation; The team obtained the 2009 and 2013 national signal processing academic conference excellence award nomination, the best paper award of HMMME 2013, Beijing Science & Technology Progress Award Grade 2 (2014), the best paper nomination of HMMME 2015, and the best paper award of 2015 national signal processing academic conference.
Now the group host several national key projects, including 863 project: "Mobile Terminal Oriented Multimodal Natural Interaction Technology"; the National Science Fund for Distinguished Young Scholars "Multimodal Fusion of Speech Analysis and Speech Production Theory and Method"; two key projects of national natural science foundation of "Neural Physiological Modeling and Control of Speech Production Process" and "Application of New Interactive Calculation Theory, Method and Key Technology fromMotion-Sensing". The team completed high-quality multilingual speech recognition and synthesis system, which employed prosody prediction model based on combining statistics and rules, effectively combines the Chinese, English, Cantonese and other language or dialect (Shanghai, Tianjin, Sichuan).The system is also able to optimize the speech repository and control automatically, realized the high-speed running in the embedded platform (PDA or mobile phones) and high degree of speech synthesis naturalness. The group constructed a history oriented hierarchical multimodal fusion of dialog information and response model. The computer could complete the multi subject precision reply and the imprecise reply based on dialogue history. The related techniques were transformed to the famous companies such as Baidu, Samsung, Tencent, Lenovo, BMW (China) and achieved good economic and social benefits.
The group is the core member of “National Voice Interactive Technology Group”, “European Center of Excellent Speech Synthesis Group", and “W3C Speech Synthesis Markup Languagedevelopment Group”. Cooperation with the relevant units, we completed the national technical standards, "Chinese speech synthesis system General technical specifications" (GB / T 21024-2007), W3C standard "Speech Synthesis Markup Language (SSML) Version 1.1", W3C standard "Emotion Markup Language (EMOXG )"and other national important international standards.



