Towards Machines that Perceive and Communicate
ABSTRACT: I will summarize some recent work related to visual scene understanding and “grounded” language understanding. In particular, I will discuss a connected group of results from my group at Google:
– Our DeepLab system for semantic segmentation (PAMI’17) [1].
– Our object detection system (CVPR ‘17 and 1st place in COCO’16) [2].
– Our instance segmentation system (2nd place in COCO’16)
– Our person detection/pose estimation system [3] (CVPR’17 and 2nd place in COCO’16)
– Visually grounded referring expressions (CVPR’16) [4].
– Discriminative image captioning (CVPR’17) [5].
– Optimizing semantic metrics for image captioning using RL (ICCV’17) [6]
– Generative models of visual imagination (submitted to NIPS’17).
I will explain how each of these pieces can be combined to develop systems that can better understand images and words.
BIO: Bio: Kevin Murphy is a research scientist at Google in Mountain View, California, where he works on AI, machine learning, and computer vision. Before joining Google in 2011, he was an associate professor (with tenure) of computer science and statistics at the University of British Columbia in Vancouver, Canada. Before starting at UBC in 2004, he was a postdoc at MIT. Kevin got his BA from U. Cambridge, his MEng from U. Pennsylvania, and his PhD from UC Berkeley. He has published over 80 papers in refereed conferences and journals, as well as an 1100-page textbook called “Machine Learning: a Probabilistic Perspective” (MIT Press, 2012), which was awarded the 2013 DeGroot Prize for best book in the field of Statistical Science. Kevin is also the (co) Editor-in-Chief of JMLR (the Journal of Machine Learning Research).