Invited Talks
Knowledge-rich Speech Processing: beyond Current Deep Learning
Chin-Hui Lee Professor School of ECE, Georgia Institute of Technology Time 9:45 - 10:45, December 6, 2018 |
Biography |
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 42,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''. |
|
Abstract | |
Deep neural networks (DNNs) are becoming ubiquitous in designing speech processing algorithms. However, the robustness issues that have hindered a wide-spread deployment of speech technologies for decades still have not been fully resolved. In this talk, we first discuss capabilities and limitations of deep learning technologies. Next, we illustrate three knowledge-rich techniques, namely: (1) automatic speech attribute transcription (ASAT) integrating acoustic phonetic knowledge into speech processing and computer assisted pronunciation training (CAPT), (2) Bayesian DNNs leveraging upon speaker information for adaptation and system combination, and (3) DNN-based speech pre-processing, demonstrating better acoustics leads to more accurate speech recognition. Finally, we argue that domain knowledge in speech, language and acoustics is heavily needed beyond current blackbox deep learning in order to formulate sustainable whitebox solutions to further advance speech technologies. |
Knowledge-Guided Natural Language Processing
Zhiyuan Liu Associate Professor Tsinghua University Time 10:45 - 11:15, December 6, 2018 |
Biography |
Zhiyuan Liu is an associate professor at the Department of Computer Science and Technology, Tsinghua University. He received his Ph.D. degree in Computer Science from Tsinghua in 2011. His research interests include representation learning, knowledge graphs and social computation, and has published more than 60 papers in top-tier conferences and journals of AI and NLP including ACL, IJCAI and AAAI, cited by more than 3500 according to Google Scholar. |
|
Abstract | |
Recent years have witnessed the advances of deep learning techniques in various areas of NLP. However, as a typical data-driven approach, deep learning suffers from the issue of poor interpretability. A potential solution is to incorporate large-scale symbol-based knowledge graphs into deep learning. In this talk, I will present recent works on knowledge-guided deep learning methods for NLP. |
Edge AI for Data-Intensive Internet of Things
Guoliang Xing Professor The Chinese University of Hong Kong Time 11:15 - 11:45, December 6, 2018 |
Biography |
Guoliang Xing is currently a Professor in the Department of Information Engineering, the Chinese University of Hong Kong. Previously, he was a faculty member at Michigan State University, U.S. His research interests include Embedded AI, Edge/Fog Computing, Cyber-Physical Systems, Internet of Things (IoT), security, and wireless networking. He received the B.S. and M.S degrees from Xi’an Jiao Tong University, China, in 1998 and 2001, the D.Sc. degree from Washington University in St. Louis, in 2006. He is an NSF CAREER Award recipient in 2010. He received two Best Paper Awards and five Best Paper Nominations at several first-tier conferences including ICNP and IPSN. Several mobile health technologies developed in his lab won Best App Awards at the MobiCom conference and were successfully transferred to the industry. He received the Withrow Distinguished Faculty Award from Michigan State University in 2014. He serves as the General Chair for IPSN 2016 and TPC Co-Chair for IPSN 2017. |
|
Abstract | |
Internet of Things (IoT) represent a broad class of systems which interact with the physical world by tightly integrating sensing, communication, and compute with physical objects. Many IoT applications are data-intensive and mission-critical in nature, which generate significant amount of data that must be processed within stringent time constraints. It’s estimated that 0.75 GB of data can be produced by an autonomous vehicle each second. The existing Cloud computing paradigm is inadequate for such applications due to significant or unpredictable delay and concerns on data privacy. |
Mental Health Computing via Harvesting Social Media Data
Jia Jia Associate Professor Tsinghua University Time 11:45 - 12:15, December 6, 2018 |
Biography |
Dr. Jia Jia is a tenured associate professor in Department of Computer Science and Technology, Tsinghua University. Her main research interest is affective computing and human computer speech interaction. She has been awarded ACM Multimedia Grand Challenge Prize (2012), Scientific Progress Prizes from the National Ministry of Education as the First Person-in-charge (2016), IJCAI Early Career Spotlight (2018), ACM Multimedia Best Demo Award (2018) and ACM SIGMM Emerging Leaders (2018). She has authored about 70 papers in leading conferences and journals including T-KDE, T-MM, T-MC, T-ASLP, T-AC, ACM Multimedia, AAAI, IJCAI, WWW etc. She also has wide research collaborations with Tencent, SOGOU, Huawei, Siemens, MSRA, Bosch, etc. |
|
Abstract | |
Psychological stress and depression are threatening people’s health. It is non-trivial to detect stress or depression timely for proactive care. With the popularity of social media, people are used to sharing their daily activities and interacting with friends on social media platforms, making it feasible to leverage online social media data for stress and depression detection. In this talk, we will systematically introduce our work on stress and depression detection employing large-scale benchmark datasets from real-world social media platforms, including 1) stress-related and depression-related textual, visual and social attributes from various aspects, 2) novel hybrid models for binary stress detection, stress event and subject detection, and cross-domain depression detection, and finally 3) several intriguing phenomena indicating the special online behaviors of stressed as well as depressed people. We would also like to demonstrate our developed mental health care applications at the end of this talk. |
Meeting the New Challenges in Speech Processing: Some NPU-ASLP Approaches
Lei Xie Professor Northwestern Polytechnical University Time 8:30 - 9:00, December 7, 2018 |
Biography Lei Xie is currently a Professor in the School of Computer Science, Northwestern Polytechnical University, Xian, China. From 2001 to 2002, he was with the Department of Electronics and Information Processing, Vrije Universiteit Brussel (VUB), Brussels, Belgium, as a Visiting Scientist. From 2004 to 2006, he worked in the Center for Media Technology (RCMT), City University of Hong Kong. From 2006 to 2007, he worked in the Human-Computer Communications Laboratory (HCCL), The Chinese University of Hong Kong. His current research interests include audio, speech and language processing, multimedia and human-computer interaction. He is currently an associate editor of IEEE/ACM Trans. on Audio, Speech and Language Processing. He has published more than 140 papers in major journals and proceedings, such as IEEE TASLP, IEEE TMM, Signal Processing, Pattern Recognition, ACM Multimedia, ACL, INTERSPEECH and ICASSP. |
Abstract | |
Speech has become a popular human-machine interface due to fast development of deep learning, big data and super-computing. We can see many applications in smartphones, TVs, robots and smart speakers. However, for further wide deployments of speech interfaces, there are still many challenges we have to face, such as noise interferences, inter- and intra-speaker variations, speaking styles and low-resource scenarios. In this talk, I will introduce several approaches, recently developed in the Audio, Speech and Language Processing Group, Northwestern Polytechnical University (NPU-ASLP) team, to meet these challenges in speech recognition, speech enhancement and speech synthesis. |
Deep Visual Scene Understanding
Bolei Zhou Professor The Chinese University of Hong Kong Time 9:00 - 9:30, December 7, 2018 |
Biography |
Bolei Zhou is an Assistant Professor with the Information Engineering Department at the Chinese University of Hong Kong. He received his PhD in computer science at Massachusetts Institute of Technology (MIT). His research is in computer vision and machine learning, focusing on visual scene understanding and interpretable deep learning. He received the Facebook Fellowship, Microsoft Research Fellowship, MIT Greater China Fellowship, and his research was featured in media outlets such as TechCrunch, Quartz, and MIT News. |
|
Abstract | |
Deep learning has made great progress in computer vision, achieving human-level object recognition. However, visual scene understanding, which aims at interpreting objects and their spatial relations in complex scene context, remains challenging. In this talk I will first introduce the recent progress of deep learning for visual scene understanding. From the 10-million image dataset Places to the pixel-level annotated dataset ADE20K, I will show the power of data and its synergy with interpretable deep neural networks for better scene recognition and parsing. Then I will talk about the trend of visual recognition from supervised learning towards more active learning scenario. Applications including city-scale perception and spatial navigation will be discussed. |
Event Level Video Captioning based on Attentional RNN
Chun Yuan Associate Professor Tsinghua University Time 9:30 - 10:00, December 7, 2018 |
Biography |
Chun Yuan is currently an Associate Professor with the Division of Information Science and Technology at Graduate school at Shenzhen, Tsinghua University. He received his M.S. and Ph.D. degrees from the Department of Computer Science and Technology, Tsinghua University, Beijing, China, in 1999 and 2002, respectively. He once worked at the INRIA-Rocquencourt, Paris, France, as a Post-doc research fellow from 2003 to 2004. In 2002, he worked at Microsoft Research Asia, Beijing, China, as an intern. His research interests include computer vision, machine learning and multimedia technologies. He is now the executive vice director of “Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems” |
|
Abstract | |
Video understanding is a hotspot and challenge subject featured by jointly knowledge of natural language processing (NLP) and computer vision. More and more commercial application of online multimedia content requires better automatic understanding of video events. Unlike image captioning, video captioning faces more obstacles. First, video is complex data form to get and utilize feature, comparing to image. The temporal change makes sufficient information and different methods have their own shortages in mining temporal information. Second, in the task of captioning, the generation of sentence is required to extract dynamic information from videos. While some methods deal well with short ant monotone actions, mining with longer and more complex actions is next goal. Third, some new tasks like captioning multiple events, call for new algorithm to get event-level processing. When generating sentences, correctly generate words like “continue” or “another” is one manifestation of good exploit context information. |
Developing a Personalized Emotional Conversational Agent for Learning Spoken English
Pengfei Liu CTO SpeechX Time 10:30 - 12:00, December 7, 2018 |
Biography |
Dr. Pengfei Liu received his B.E. and M.E. degrees from The East China Normal University and the Ph.D. degree from The Chinese University of Hong Kong. His research areas are natural language processing and deep learning, particularly on sentiment analysis and dialog systems. He developed the SEEMGO system which ranked 5th in the task of aspect-based sentiment analysis at SemEval-2014, and received the Technology Progress Award in JD Dialog Challenge in 2018. Dr. Liu previously worked at SAP Labs China in Shanghai, The Chinese University of Hong Kong, and Wisers AI lab in Hong Kong, where he led a team to conduct research on deep learning-based sentiment analysis. He is currently the CTO of SpeechX. Abstract |
|
The spoken English skill is critical but challenging for non-native learners in China due to lack of enough practice, while improving spoken English is in large demand among learners of different ages. This talk presents our ongoing project at SpeechX on developing a personalized emotional conversational agent which aims to provide a virtual partner for language learners to practice their spoken English. Such an agent is personalized based on each learner’s English level and interests, and meanwhile gives appropriate responses according to the learner’s emotions. Developing the agent involves a lot of research challenges such as consistency and personalization in dialog systems, multimodal emotion recognition, expressive speech synthesis and so on. In this talk, we will briefly introduce our work responding to these challenges, present a preliminary proof-of-concept prototype and discuss future research perspectives. About SpeechX | |
SpeechX was founded in Hong Kong and Shenzhen in 2016. The founders are mainly from the Human-Computer Communications Laboratory (HCCL) in The Chinese University of Hong Kong. The mission of SpeechX is to empower language learning with AI to be more efficient, productive and enjoyable. |
专业商务智能语音的应用及挑战
陈文明 深圳壹秘科技有限公司 创始人 时间: 10:30 - 12:00, 2018年12月07日 |
个人简介 |
陈文明,深圳壹秘科技有限公司创始人,中欧国际工商学院EMBA。在音视频、智能语音、智能家居、物联网领域工作18年;曾于TCL就职10余年,历任研发总经理、产品总经理、电声事业部总经理、创新事业部总经理;2016年8月创立深圳壹秘科技有限公司。 摘要 |
|
报告将以壹秘产品及服务的应用场景及市场潜力作为切入点,分享深圳壹秘科技有限公司争做智能语音单项技术应用冠军的心路历程,进而从前端语音处理的技术瓶颈、后端语言处理技术的挑战机遇两方面阐述专业商务智能语音的应用和挑战。 深圳壹秘科技有限公司简介 | |
深圳壹秘科技有限公司成立于2016年,专注移动办公产品创新及智能服务。研发的人工智能会议服务系统,基于智能语音前端阵列算法技术、自然语言处理技术、网络通讯技术,服务于全球移动办公及智能会议市场。公司现有研发人员25人,研发具有自主知识产权的语音阵列算法、语音通话及AI服务产品,拥有多项软件著作权和发明专利。在全球渠道、创新算法研发、新型产品及智能服务开发等方面有较强的竞争力。 |
AI在教育领域落地的探索
杨嵩 好未来AI LAB 语音技术负责人 时间: 10:30 - 12:00, 2018年12月07日 |
个人简介 |
杨嵩,历任思必驰高级语音工程师、苏州驰声研发主管、好未来AI LAB语音技术负责人。研究方向为语音识别、语音评测。一直致力于中高考英语口语机器评分,在线教育课堂质量自动化评估等方面工作,在该领域拥有多项专利。2014年获中国人工智能学会颁发的“吴文俊人工智能科学技术奖进步奖”。 摘要 |
|
好未来教育集团以“科技推动教育进步”作为自己的使命,深入发掘AI技术和教育场景的结合点。针对教学资源不均衡,优质师资不足的现状,发展各个场景的“AI助手”辅助教学;针对学生能力发展不平衡,推广个性化教学。此外为教育的各个环节引入不同的AI评测技术;在线下课堂教学中提供智慧教室的解决方案,让教室拥有眼睛(摄像头),耳朵(麦克风),大脑(云)及其他器官(答题器,ipad),引入音视频量化教学过程,评价课堂的教学质量;在线上课堂通过识别和分析课堂内容,评价师生间的交互状况,抽取相关特征对师生进行匹配,提高教学效率。好未来以AI技术为引擎,持续探索未来教育的新模式。 北京世纪好未来教育科技有限公司简介 | |
北京世纪好未来教育科技有限公司(NYSE:TAL)是一个以智慧教育和开放平台为主体,以素质教育和课外辅导为载体,在全球范围内服务公办教育,助力民办教育,探索未来教育新模式的科技教育公司。好未来全面布局教育产业,构建智慧教育、教育云、内容及未来教育、K12及综合能力和国际及终身教育五大事业群,旗下共有学而思、学而思网校等十多个业务品牌。连续三年入选“最具价值中国品牌100强”。 |