张长水

张长水博士毕业于清华大学自动化系,现任清华大学自动化系教授、博士生导师,智能技术与系统国家重点实验室学术委员会委员,智能技术与系统国家重点实验室副主任,自动化系主任。他主要从事图像处理、信号处理、模式识别与人工智能、进化计算等领域的研究,并和工业界有紧密的合作。他在国际期刊和会议上发表学术论文超过100篇。目前担任国际学术杂志“Pattern Recognition”的编委,“计算机学报”编委,中国人工智能学会常务理事。

http://www.tnlist.org.cn/pages/rengongzhineng_zhangchangshui.jsp

演讲题目:Aligning where to see and what to tell: image caption with region-based attention and scene factorization

摘要:Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of "abstract meaning", encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.