Publications

  • International peer reviewed conferences:

    • Efficient Image and Video Co-localization with Frank-Wolfe Algorithm
      Armand Joulin*, Kevin Tang* and Li Fei-Fei. [* indicates equal contribution]
      Proceedings of the European Conference on Computer Vision (ECCV), 2014.
      Abstract | BibTeX | PDF | Code

      Abstract

      In this paper, we tackle the problem of performing efficient co-localization in images and videos. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent state-of-the-art methods, we show how we are able to naturally incorporate temporal terms and constraints for video co-localization into a quadratic programming framework. Furthermore, by leveraging the Frank-Wolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTube-Objects dataset for videos, as well as a joint combination of the two.

      BibTeX

      @inproceedings{TangECCV14,
        title     = {Efficient Image and Video Co-localization with Frank-Wolfe Algorithm},
        author    = {A. Joulin and K. Tang and L. Fei-Fei},
        year      = {2014},
        booktitle = {European Conference on Computer Vision (ECCV)},
      }
         
    • Linking people in videos with "their" names using coreference resolution
      Vignesh Ramanathan, Armand Joulin, Percy Liang and Li Fei-Fei.
      Proceedings of the European Conference on Computer Vision (ECCV), 2014.
      Abstract | BibTeX | PDF | supp

      Abstract

      Natural language descriptions of videos provide a potentially rich and vast source of supervision. However, the highly-varied nature of language presents a major barrier to its effective use. What is needed are models that can reason over uncertainty over both videos and text. In this paper, we tackle the core task of person naming: assigning names of people in the cast to human tracks in TV videos. Screenplay scripts accompanying the video provide some crude supervision about who's in the video. However, even the basic problem of knowing who is mentioned in the script is often difficult, since language often refers to people using pronouns (e.g., ``he") and nominals (e.g., ``man") rather than actual names (e.g., ``Susan''). Resolving the identity of these mentions is the task of \emph{coreference resolution}, which is an active area of research in natural language processing. We develop a joint model for person naming and coreference resolution, and in the process, infer a latent alignment between tracks and mentions. We evaluate our model on both vision and NLP tasks on a new dataset of 19 TV episodes. On both tasks, we significantly outperform the independent baselines.

      BibTeX

      @inproceedings{ramanathan2014linking,
        author = {V. Ramanathan and A. Joulin and P. Liang and L. Fei-Fei},
        booktitle = {European Conference on Computer Vision (ECCV)},
        title = {Linking people with "their" names using coreference resolution},
        year = {2014},
      }
         
    • Co-localization in Real-World Images
      Kevin Tang, Armand Joulin, Li-Jia Li and Li Fei-Fei.
      Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
      Abstract | BibTeX | PDF | Code

      Abstract

      In this paper, we tackle the problem of co-localization in real-world images. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images. Although similar problems such as co-segmentation and weakly supervised localization have been previously studied, we focus on being able to perform co-localization in real-world settings, which are typically characterized by large amounts of intraclass variation, inter-class diversity, and annotation noise. To address these issues, we present a joint image-box formulation for solving the co-localization problem, and show how it can be relaxed to a convex quadratic program which can be efficiently solved. We perform an extensive evaluation of our method compared to previous state-of-the-art approaches on the challenging PASCAL VOC 2007 and Object Discovery datasets. In addition, we also present a large-scale study of co-localization on ImageNet, involving ground-truth annotations for 3,624 classes and approximately 1 million images.

      BibTeX

      @InProceedings{TangJouFeiFeiCVPR14,
      title = "Co-localization in Real-World Images",
      booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
      author = "Kevin Tang, Armand Joulin, Li-Jia Li and Li Fei-Fei",
      year = "2014"
      }
         
    • Recovering Stereo Pairs from Anaglyphs
      Armand Joulin and Sing Bing Kang.
      Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
      Abstract | BibTeX | PDF | Supp | pptx

      Abstract

      An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses. We propose a technique to reconstruct the original color stereo pair given such an anaglyph. We modified SIFT-Flow and use it to initially match the different color channels across the two views. Our technique then iteratively refines the matches, selects the good matches (which defines the ``anchor'' colors), and propagates the anchor colors. We use a diffusion-based technique for the color propagation, and added a step to suppress unwanted colors. Results on a variety of inputs demonstrate the robustness of our technique. We also extended our method to anaglyph videos by using optic flow between time frames.

      BibTeX

      @InProceedings{JouKanCVPR13b,
      title = "Recovering Stereo Pairs from Anaglyphs",
      booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
      author = "A. Joulin and S. B. Kang",
      year = "2013"
      }
         
    • Unsupervised Joint Object Discovery and Segmentation in Internet Images
      Michael Rubinstein, Armand Joulin, Johannes Kopf and Ce Liu.
      Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
      Abstract | BibTeX | PDF | Project page | Code

      Abstract

      We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present

      BibTeX

      @InProceedings{Rubinstein13Unsupervised,
      author = {M. Rubinstein and A. Joulin and J. Kopf and C. Liu},
      title = {Unsupervised Joint Object Discovery and Segmentation in Internet Images},
      booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
      year = {2013}
      }
         
    • A convex relaxation for weakly supervised classifiers
      Armand Joulin and Francis Bach.
      Proceedings of the International Conference on Machine Learning (ICML), 2012.
      Abstract | BibTeX | PDF | Slides | Code

      Abstract

      This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as expectation-maximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the soft-max loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semi-supervised learning, as well as on clustering tasks.

      BibTeX

         @InProceedings{JouBacICML12,
         title = "A convex relaxation for weakly supervised classifiers",
         booktitle = "Proceedings of the International Conference on Machine Learning (ICML)",
         author = "A. Joulin and F. Bach",
         year = "2012"
      }
         
    • Multi-Class Cosegmentation
      Armand Joulin, Francis Bach and Jean Ponce.
      Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
      Abstract | BibTeX | PDF | Slides | Supp | Code

      Abstract

      Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In the cosegmentation context, on the other hand, the availability of multiple images assumed to contain instances of the same object classes provides a weak form of supervision that can be exploited by discriminative approaches. Unfortunately, most existing algorithms are limited to a very small number of images and/or object classes (typically two of each). This paper proposes a novel energy-minimization approach to cosegmentation that can handle multiple classes and a significantly larger number of images. The proposed cost function combines spectral- and discriminative-clustering terms, and it admits a probabilistic interpretation. It is optimized using an efficient EM method, initialized using a convex quadratic approximation of the energy. Comparative experiments show that the proposed approach matches or improves the state of the art on several standard datasets.

      BibTeX

         @InProceedings{JouBacPon12,
         title = "Multi-Class Cosegmentation",
         booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
         author = "A. Joulin and F. Bach and J. Ponce",
         year = "2012"
      }
         
    • A Graph-matching Kernel for Object Categorization
      Olivier Duchenne, Armand Joulin and Jean Ponce.
      Proceedings of the International Conference on Computer Vision (ICCV), 2011.
      Abstract | BibTeX | PDF | Code

      Abstract

      This paper addresses the problem of category-level image classification. The underlying image model is a graph whose nodes correspond to a dense set of regions, and edges reflect the underlying grid structure of the image and act as springs to guarantee the geometric consistency of nearby regions during matching. A fast approximate algorithm for matching the graphs associated with two images is presented. This algorithm is used to construct a kernel appropriate for SVM-based image classification, and experiments with the Caltech 101, Caltech 256, and Scenes datasets demonstrate performance that matches or exceeds the state of the art for methods using a single type of features.

      BibTeX

         @InProceedings{DucJouPon11,
         title = "A Graph-Matching Kernel for Object Categorization",
         booktitle = "Proceedings of the International Conference in Computer Vision (ICCV)",
         author = "O. Duchenne and A. Joulin and J. Ponce",
         year = "2011"
      }
         
    • Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties
      Toby Dylan Hocking, Armand Joulin, Francis Bach and Jean-Philippe Vert.
      Proceedings of the International Conference on Machine Learning (ICML), 2011.
      Abstract | BibTeX | PDF | Project page | Code

      Abstract

      We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally gives state-ofthe-art results similar to spectral clustering for non-convex clusters, and has the added benefit of learning a tree structure from the data.

      BibTeX

      @InProceedings{ hocking2011clusterpath,
         title = "Clusterpath An Algorithm for Clustering using Convex Fusion Penalties",
         booktitle = "In The International Conference on Machine Learning (ICML)",
         author = "T.D. Hocking and A. Joulin and F. Bach and J.P. Vert",
         year = "2011"
      }
       
    • Efficient Optimization for Discriminative Latent Class Models
      Armand Joulin, Francis Bach and Jean Ponce.
      Advances in Neural Information Processing System (NIPS), 2010.
      Abstract | BibTeX | PDF | Code

      Abstract

      Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering methods with added benefits in terms of number of hyperparameters and optimization. While the expectation-maximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a non-convex cost function. To avoid this problem, we introduce a local approximation of this cost function, which in turn leads to a quadratic non-convex optimization problem over a product of simplices. In order to maximize quadratic functions, we propose an efficient algorithm based on convex relaxations and lowrank representations of the data, capable of handling large-scale problems. Experiments on text document classification show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than existing discriminative clustering methods.

      BibTeX

      @InProceedings{JouBacPonc10_nips,
         title = "Efficient Optimization for Discriminative Latent Class Models",
         booktitle = "Advances in Neural Information Processing Systems (NIPS)",
         author = "A. Joulin and F. Bach and J. Ponce",
         year = "2010"
      } 
      
    • Discriminative Clustering for Image Co-segmentation
      Armand Joulin, Francis Bach and Jean Ponce.
      Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
      Abstract | BibTeX | PDF | Slides | Code

      Abstract

      Purely bottom-up, unsupervised segmentation of a single image into foreground and background regions remains a challenging task for computer vision. Co-segmentation is the problem of simultaneously dividing multiple images into regions (segments) corresponding to different object classes. In this paper, we combine existing tools for bottom-up image segmentation such as normalized cuts, with kernel methods commonly used in object recognition. These two sets of techniques are used within a discriminative clustering framework: the goal is to assign foreground/background labels jointly to all images, so that a supervised classifier trained with these labels leads to maximal separation of the two classes. In practice, we obtain a combinatorial optimization problem which is relaxed to a continuous convex optimization problem, that can itself be solved efficiently for up to dozens of images. We illustrate the proposed method on images with very similar foreground objects, as well as on more challenging problems with objects with higher intra-class variations.

      BibTeX

      @InProceedings{JouBacPonc10_cvpr,
         title = "Discriminative Clustering for Image Co-segmentation",
         booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
         author = "A. Joulin and F. Bach and J. Ponce",
         year = "2010"
      }
      

  • Technical reports:

    • Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
      Andrej Karpathy, Armand Joulin, Li Fei-Fei
      Technical Report (2014).
      PDF | supp | BibTeX

      BibTeX

      @techreport{karpathy2014fragment,
        author = {Karpathy, Andrej and Joulin, Armand and Fei-Fei, Li},
        title = {Deep Fragment Embeddings for Bidirectional Image Sentence Mapping},
        year = {2014},
        arxiv = {http://arxiv.org/abs/1406.5679},
        institution = {Stanford University}
      }
      
    • Stock price jumps: news and volume play a minor role
      Armand Joulin, Augustin Lefevre, Daniel Grunberg and Jean-Philippe Bouchaud.
      Technical report (2008).
      PDF
  • Thesis:

    • Convex optimization for cosegmentation
      Armand Joulin
      Phd Thesis, 2012.
      PDF | Slides
  • Notes: