Compositionality in Computer Vision

Held in conjunction with CVPR 2020 in Seattle, US


Overview

People understand the world as a sum of its parts. Events are composed of other actions, objects can be broken down into pieces, and this sentence is composed of a series of words. When presented with new concepts, people can decompose the novelty into familiar parts. Our knowledge representation is naturally compositional. Unfortunately, many of the underlying architectures that catalyze vision tasks generate representations that are not compositional.

In our workshop, We will discuss compositionality in computer vision --- the notion that the representation of the whole should be composed of the representation of its parts. As humans, our perception is intertwined greatly by reasoning through composition: we understand a scene by components, a 3D shape by parts, an activity by events, etc. We hypothesize that intelligent agents also need to develop compositional understanding that is robust, generalizable, and powerful. In computer vision, there was a long-standing line of work based on semantic compositionality such as part-based object recognition. Pioneering statistical modeling approaches have built hierarchical feature representations for numerous vision tasks. And more recently, recent works has demonstrated that concepts can be learned from only a few examples using a compositional representation. As we move towards higher-level reasoning tasks, our workshop aims at revisiting the idea and reflecting on the future directions of compositionality.

At the workshop, we would like to discuss the following questions. How should we represent composition in scenes, videos, 3D spaces and robotics? How can human perception shed light on compositional understanding algorithms? What are the benefits of exploring compositionality? What structures, architectures and learning algorithms help models learn compositionality? How do we find the balance between compositional and black-box-based understanding? What problems are there in the current compositional understanding methods and how can we remedy them? What efforts should our community make in the future? What inductive biases can be build into our architectures to improve few-shot learning, meta learning and compositional decomposition?

To receive notifications about updates related to the workshop, sign up using this form.


Program Schedule - TBA

Important Dates and Details

  • Signup to receive updates: using this form
  • Apply to be part of Program Committee by: Feb 15, 2020 using this link
  • Paper submission deadline: Mar 1, 2020 at 11:59pm. CMT portal TBA.
  • Notification of acceptance: Mar 15, 2020
  • Camera ready due: Apr 15, 2020


Invited Speakers

Jitendra Malik is Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Science at the University of California at Berkeley, where he also holds appointments in vision science, cognitive science and Bioengineering. He received the PhD degree in Computer Science from Stanford University in 1985 following which he joined UC Berkeley as a faculty member. He served as Chair of the Computer Science Division during 2002-2006, and of the Department of EECS during 2004-2006. Jitendra's group has worked on computer vision, computational modeling of biological vision, computer graphics and machine learning. Several well-known concepts and algorithms arose in this work, such as anisotropic diffusion, normalized cuts, high dynamic range imaging and shape contexts. He was awarded the Longuet-Higgins Award for “A Contribution that has Stood the Test of Time” twice, in 2007 and 2008, received the PAMI Distinguished Researcher Award in computer vision in 2013 the K.S. Fu prize in 2014, and the IEEE PAMI Helmholtz prize for two different papers in 2015. Jitendra Malik is a Fellow of the IEEE, ACM, and the American Academy of Arts and Sciences, and a member of the National Academy of Sciences and the National Academy of Engineering.

Aude Oliva has a dual French baccalaureate in Physics and Mathematics and a B.Sc. in Psychology (minor in Philosophy). She received two M.Sc. degrees –in Experimental Psychology, and in Cognitive Science and a Ph.D from the Institut National Polytechnique of Grenoble, France. She joined the MIT faculty in the Department of Brain and Cognitive Sciences in 2004, the MIT Computer Science and Artificial Intelligence Laboratory - CSAIL - in 2012, the MIT-IBM Watson AI Lab in 2017, and the leadership of the Quest for Intelligence in 2018. She is also affiliated with the Athinoula A. Martinos Imaging Center at the McGoven Institute for Brain Research MIT, and the MIT CSAIL Initiative "Systems That Learn". She is the MIT Executive Director of the MIT-IBM Watson AI Lab, and the Executive Director of the MIT Quest for Intelligence, a new MIT-wide initiative which seeks to discover the foundations of human and machine intelligence and deliver transformative new technology for humankind. She is currently on the Scientific Advisory Board of the Allen Institute for Artificial Intelligence.

Abhinav Gupta is an Associate Professor at the Robotics Institute, Carnegie Mellon University. and Research Manager at Facebook AI Research (FAIR). Abhinav's research focuses on scaling up learning by building self-supervised, lifelong and interactive learning systems. Specifically, he is interested in how self-supervised systems can effectively use data to learn visual representation, common sense and representation for actions in robots. Abhinav is a recipient of several awards including ONR Young Investigator Award, PAMI Young Research Award, Sloan Research Fellowship, Okawa Foundation Grant, Bosch Young Faculty Fellowship, YPO Fellowship, IJCAI Early Career Spotlight, ICRA Best Student Paper award, and the ECCV Best Paper Runner-up Award. His research has also been featured in Newsweek, BBC, Wall Street Journal, Wired and Slashdot.

Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the ability to enable robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, Finn has developed deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelors degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, an NSF graduate fellowship, a Facebook fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. With Sergey Levine and John Schulman, Finn also designed and taught a course on deep reinforcement learning, with thousands of followers online. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across three universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.

Animesh Garg is a Assistant Professor of Computer Science at University of Toronto and a Faculty Member at the Vector Institute. He leads the Toronto People, AI and Robotics (PAIR) research group. He is affiliated with Mechanical and Industrial Engineering (courtesy) and Toronto Robotics Institute. He also shares time as a senior research scientist at Nvidia in ML and Robotics. Prior to this, he was a postdoc at Stanford AI Lab working with Fei-Fei Li and Silvio Savarese. He received MS in Computer Science and Ph.D. in Operations Research from the UC, Berkeley in 2016. He was advised by Ken Goldberg in the Automation Lab as a part of the Berkeley AI Research Lab (BAIR). He also worked closely with Pieter Abbeel, Alper Atamturk and UCSF Radiation Oncology.

Angjoo Kanazawa will be starting as an Assistant Professor at UC Berkeley from Fall 2020. She is a research scientist at Google NYC. Previously, she was a BAIR postdoc at UC Berkeley advised by Jitendra Malik, Alexei A. Efros and Trevor Darrell. She completed her PhD in CS at the University of Maryland, College Park with her advisor David Jacobs. Prior to UMD, she spent four years at NYU where she worked with Rob Fergus and completed her BA in Mathematics and Computer Science.


Call for Papers

This workshop aims to bring together researchers from both academia and industry interested in addressing various aspects of compositional understanding in computer vision. The domains include but are not limited to scene understanding, video analysis, 3D vision and robotics. For each of these domains, we will discuss the following topics:

  • Algorithmic approaches: How should we develop and improve representations of compositionality for learning, such as graph embedding, message-passing neural networks, probabilistic models, etc.?
  • Evaluation methods: What are the convincing metrics to measure the robustness, generalizability, and accuracy of compositional understanding algorithms?
  • Cognitive aspects: How would cognitive science research inspire computational model to capture compositionality as humans do?
  • Optimization and scalability challenges: How should we handle the inherent representations of different components and curse of dimensionality of graph-based data? How should we effectively collect large-scale databases for training multi-tasking models?
  • Domain-specific applications: How should we improve scene graph generation, spatio-temporal-graph-based action recognition, structural 3D recognition and reconstruction, meta-learning, reinforcement learning, etc.?
  • Any other topic of interest for compositionality in computer vision.

We invite researchers and practitioners to submit their work to a CMT portal (TBA).


Please contact Jingwei Ji or Ranjay Krishna with any questions: jingweij / ranjaykrishna [at] cs [dot] stanford [dot] edu.


Program Committee

  • Shyamal Buch - Stanford University
  • Chien-Yi Chang - Stanford University
  • Apoorva Dornadula - Stanford University
  • Yong-Lu Li - Shanghai Jiao Tong University
  • Bingbin Liu - Carnegie Mellon University
  • Karttikeya Mangalam - University of California, Berkeley
  • Kaichun Mo - Stanford University
  • Samsom Saju - Mindtree
  • Gunnar Sigurdsson - Carnegie Mellon University
  • Paroma Varma - Stanford University
  • More to come...

If you are interested in taking a more active part in the workshop, apply to join the program committee using this link.