MICS Seminar: Stéphane Lathuilière
--Stéphane Lathuilière (LTCI - Telecom-Paris) : Self-Supervised Representation Learning for Video Generation
Jeudi 21 avril 2022, 14h00Tremenet
Generating realistic images and videos has countless applications in different areas, ranging from photography technologies to e-commerce business. Recently, deep generative approaches have emerged as effective techniques for generation tasks. In this talk, we will illustrate how self-supervised representation learning can be employed to design video generation methods. First, we will present our recent framework for image animation. More precisely, our approach learns a motion representation that is employed to generate videos where an object in a source image is animated according to the motion of a driving video. In this task, we employ a motion representation based on keypoints that are learned in a self-supervised fashion. Therefore, our approach can animate any arbitrary object without using annotation or prior information about the specific object to animate. Then, we will introduce the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We introduce a novel framework for PVG which is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. Finally, we will see how this approach can be extended to model 3D environments and unlock manipulation in space and time.
References:
Animating arbitrary objects via deep motion transfer, A. Siarohin, S. Lathuilière, S. Tulyakov, E. Ricci, N. Sebe, CVPR 2019 First Order Motion Model for Image Animation, A. Siarohin, S. Lathuilière, S. Tulyakov, E. Ricci, N. Sebe, Neurips 2019 Playable Video Generation, W. Menapace, S. Lathuilière, S. Tulyakov, A. Siarohin, E. Ricci, CVPR 2021 Click to Move: Controlling Video Generation with Sparse Motion P Ardino, M De Nadai, B Lepri, E Ricci, S Lathuilière, ICCV 2021 Playable Environments: Video Manipulation in Space and Time W. Menapace, S. Lathuilière, A. Siarohin, C. Theobalt, S. Tulyakov, V. Golyanik, E. Ricci, CVPR 2022
Video references:
machine learning & computer vision