Compositionality in Computer Vision
June 15th, held in conjunction with CVPR 2020 Virtual
People understand the world as a sum of its parts. Events are composed of other actions, objects can be
broken down into pieces, and this sentence is composed of a series of words. When presented with new concepts,
people can decompose the novelty into familiar parts. Our knowledge representation is naturally compositional.
Unfortunately, many of the underlying architectures that catalyze vision tasks generate representations that
are not compositional.
In our workshop, We will discuss compositionality in computer vision --- the notion that the representation
of the whole should be composed of the representation of its parts. As humans, our perception is intertwined
greatly by reasoning through composition: we understand a scene by components, a 3D shape by parts, an
activity by events, etc. We hypothesize that intelligent agents also need to develop compositional
understanding that is robust, generalizable, and powerful. In computer vision, there was a long-standing line
of work based on semantic compositionality such as part-based object recognition. Pioneering statistical
modeling approaches have built hierarchical feature representations for numerous vision tasks. And more
recently, recent works has demonstrated that concepts can be learned from only a few examples using a
compositional representation. As we move towards higher-level reasoning tasks, our workshop aims at revisiting
the idea and reflecting on the future directions of compositionality.
At the workshop, we would like to discuss the following questions. How should we represent composition in
scenes, videos, 3D spaces and robotics? How can human perception shed light on compositional understanding
algorithms? What are the benefits of exploring compositionality? What structures, architectures and learning
algorithms help models learn compositionality? How do we find the balance between compositional and
black-box-based understanding? What problems are there in the current compositional understanding methods and
how can we remedy them? What efforts should our community make in the future? What inductive biases can be
build into our architectures to improve few-shot learning, meta learning and compositional decomposition?
Time (Pacific Time, UTC-7)
08:30 - 08:45
08:45 - 10:15
Composition in Concept, Space and Time
Carnegie Mellon University
Meta-Learning Symmetries and Distributions
10:15 - 11:00
A Roadmap for Activity and Event Recognition Models
Massachusetts Institute of Technology
11:00 - 11:45
What next in Computer Vision
University of California, Berkeley
11:45 - 12:30
12:30 - 13:00
Poster session #1
Training Neural Networks to Produce Compatible Features
Michael Gygli, Jasper Uijlings, Vittorio Ferrari
Exploring Latent Class Structures in Classification-By-Components Networks
Decomposing Image Generation into Layout Prediction and Conditional Synthesis
Anna Volokitin, Ender Konukoglu, Luc Van Gool
Semantic Bottleneck Layers: Quantifying and Improving Inspectability of Deep Representations
Max Losch, Mario Fritz, Bernt Schiele
13:00 - 13:45
Unsupervised Representations towards Counterfactual Predictions
University of Toronto
13:45 - 14:30
Composing Humans and Objects in the 3D World
University of California, Berkeley
14:30 - 15:15
Live panel discussion
- Jitendra Malik
- Aude Oliva
- Chelsea Finn
- Animesh Garg
- Angjoo Kanazawa
Moderated by Ranjay Krishna
15:15 - 15:45
15:45 - 16:05
Oral talk #1
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
Adam Kortylewski, Ju He, Qing Liu, Alan Yuille
16:05 - 16:25
Oral talk #2
PaStaNet: Toward Human Activity Knowledge Engine
Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu
16:25 - 16:45
Oral talk #3
Searching for Actions on the Hyperbole
Teng Long, Pascal Mettes, Heng Tao Shen, Cees Snoek
16:45 - 17:15
Poster session #2
Inferring Temporal Compositions of Actions Using Probabilistic Automata
Rodrigo Santa Cruz, Anoop Cherian, Basura Fernando, Dylan Campbell, Stephen Gould
Understanding Action Recognition in Still Images
Deeptha Girish, Vineeta Singh, Anca Ralescu
17:15 - 17:30
is Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Science at the University
of California at Berkeley, where he also holds appointments in vision science, cognitive science and
Bioengineering. He received the PhD degree in Computer Science from Stanford University in 1985 following which
he joined UC Berkeley as a faculty member. He served as Chair of the Computer Science Division during 2002-2006,
and of the Department of EECS during 2004-2006. Jitendra's group has worked on computer vision, computational
modeling of biological vision, computer graphics and machine learning. Several well-known concepts and
algorithms arose in this work, such as anisotropic diffusion, normalized cuts, high dynamic range imaging and
shape contexts. He was awarded the Longuet-Higgins Award for “A Contribution that has Stood the Test of Time”
twice, in 2007 and 2008, received the PAMI Distinguished Researcher Award in computer vision in 2013 the K.S. Fu
prize in 2014, and the IEEE PAMI Helmholtz prize for two different papers in 2015. Jitendra Malik is a Fellow of
the IEEE, ACM, and the American Academy of Arts and Sciences, and a member of the National Academy of Sciences
and the National Academy of Engineering.
has a dual French baccalaureate in Physics and Mathematics and a B.Sc. in Psychology (minor in Philosophy). She
received two M.Sc. degrees –in Experimental Psychology, and in Cognitive Science and a Ph.D from the Institut
National Polytechnique of Grenoble, France. She joined the MIT faculty in the Department of Brain and Cognitive
Sciences in 2004, the MIT Computer Science and Artificial Intelligence Laboratory - CSAIL - in 2012, the MIT-IBM
Watson AI Lab in 2017, and the leadership of the Quest for Intelligence in 2018. She is also affiliated with the
Athinoula A. Martinos Imaging Center at the McGoven Institute for Brain Research MIT, and the MIT CSAIL
Initiative "Systems That Learn". She is the MIT Executive Director of the MIT-IBM Watson AI Lab, and the
Executive Director of the MIT Quest for Intelligence, a new MIT-wide initiative which seeks to discover the
foundations of human and machine intelligence and deliver transformative new technology for humankind. She is
currently on the Scientific Advisory Board of the Allen Institute for Artificial Intelligence.
is an Associate Professor at the Robotics Institute, Carnegie Mellon University. and Research Manager at
Facebook AI Research (FAIR). Abhinav's research focuses on scaling up learning by building self-supervised,
lifelong and interactive learning systems. Specifically, he is interested in how self-supervised systems can
effectively use data to learn visual representation, common sense and representation for actions in robots.
Abhinav is a recipient of several awards including ONR Young Investigator Award, PAMI Young Research Award,
Sloan Research Fellowship, Okawa Foundation Grant, Bosch Young Faculty Fellowship, YPO Fellowship, IJCAI Early
Career Spotlight, ICRA Best Student Paper award, and the ECCV Best Paper Runner-up Award. His research has also
been featured in Newsweek, BBC, Wall Street Journal, Wired and Slashdot.
is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research
interests lie in the ability to enable robots and other agents to develop broadly intelligent behavior through
learning and interaction. To this end, Finn has developed deep learning algorithms for concurrently learning
visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable
acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot
adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelors degree in
Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research
has been recognized through the ACM doctoral dissertation award, an NSF graduate fellowship, a Facebook
fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award,
and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. With
Sergey Levine and John Schulman, Finn also designed and taught a course on deep reinforcement learning, with
thousands of followers online. Throughout her career, she has sought to increase the representation of
underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged
high school students, a mentoring program for underrepresented undergraduates across three universities, and
leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
is a Assistant Professor of Computer Science at University of Toronto and a Faculty Member at the Vector
Institute. He leads the Toronto People, AI and Robotics (PAIR) research group. He is affiliated with Mechanical
and Industrial Engineering (courtesy) and Toronto Robotics Institute. He also shares time as a senior research
scientist at Nvidia in ML and Robotics. Prior to this, he was a postdoc at Stanford AI Lab working with Fei-Fei
Li and Silvio Savarese. He received MS in Computer Science and Ph.D. in Operations Research from the UC,
Berkeley in 2016. He was advised by Ken Goldberg in the Automation Lab as a part of the Berkeley AI Research Lab
(BAIR). He also worked closely with Pieter Abbeel, Alper Atamturk and UCSF Radiation Oncology.
will be starting as an Assistant Professor at UC Berkeley from Fall 2020. She is a research scientist at Google
NYC. Previously, she was a BAIR postdoc at UC Berkeley advised by Jitendra Malik, Alexei A. Efros and Trevor
Darrell. She completed her PhD in CS at the University of Maryland, College Park with her advisor David Jacobs.
Prior to UMD, she spent four years at NYU where she worked with Rob Fergus and completed her BA in Mathematics
and Computer Science.
Call for Papers
This workshop aims to bring together researchers from both academia and industry interested in addressing
various aspects of compositional understanding in computer vision. The domains include but are not limited to
scene understanding, video analysis, 3D vision and robotics. For each of these domains, we will discuss the
- Algorithmic approaches: How should we develop and improve representations of compositionality for
learning, such as graph embedding, message-passing neural networks, probabilistic models, etc.?
- Evaluation methods: What are the convincing metrics to measure the robustness, generalizability,
and accuracy of compositional understanding algorithms?
- Cognitive aspects: How would cognitive science research inspire computational model to capture
compositionality as humans do?
- Optimization and scalability challenges: How should we handle the inherent representations of
different components and curse of dimensionality of graph-based data? How should we effectively collect
large-scale databases for training multi-tasking models?
- Domain-specific applications: How should we improve scene graph generation,
spatio-temporal-graph-based action recognition, structural 3D recognition and reconstruction,
meta-learning, reinforcement learning, etc.?
- Any other topic of interest for compositionality in computer vision.
Submit in this CMT portal: cmt3.research.microsoft.com/CICV2020
We provide three submission tracks, please submit to your desired one:
Archival full paper track. The length limit is 4 - 8 pages excluding references. The format is the same as
CVPR'20 main conference submission
Accepted papers in this track will be published in CVPR workshop proceedings and IEEE Xplore. These papers
will also be in the CVF open access archive.
Non-archival short paper track. The length limit is 4 pages including references. The format is the same
as CVPR'20 main conference submission
shorter in length. Accepted papers in this track will NOT be published in CVPR workshop proceedings but
public on this workshop website. Note that accepted papers in this non-archival short paper track will not
conflict with the dual submission policy of ECCV'20.
Non-archival long paper track. This track is only for previously published papers or papers
to appear on CVPR'20 main conference. There is no page limit. Accepted papers in this track will NOT be
published in CVPR workshop proceedings.
The submission deadline for all tracks has been extended to April 3rd, 2020 at 11:59 pm PST due to COVID-19 situation.
Author notification will be sent out on April 10th, 2020. Camera ready due is April 18th, 2020.
All accepted papers will be required for poster presentation. Oral presentations will be selected from the
Please contact Jingwei Ji or Ranjay Krishna with any questions: jingweij / ranjaykrishna [at] cs [dot] stanford [dot] edu.
Important Dates and Details
- Signup to receive updates:
using this form
- Apply to be part of Program Committee by:
Feb 15, 2020
- Paper submission deadline:
Mar 27 Apr 3, 2020 at 11:59pm PST. CMT portal: cmt3.research.microsoft.com/CICV2020
- Notification of acceptance:
Apr 10, 2020
- Camera ready due:
April 18, 2020
- Workshop date: June 15, 2020
- Shyamal Buch - Stanford University
- Chien-Yi Chang - Stanford University
- Apoorva Dornadula - Stanford University
- Yong-Lu Li - Shanghai Jiao Tong University
- Bingbin Liu - Carnegie Mellon University
- Karttikeya Mangalam - University of California, Berkeley
- Kaichun Mo - Stanford University
- Samsom Saju - Mindtree
- Gunnar Sigurdsson - Carnegie Mellon University
- Paroma Varma - Stanford University
- Alec Hodgkinson - Panasonic Beta
- Boxiao Pan - Stanford University
- Mingzhe Wang - Princeton University
- Kaidi Cao - Stanford University