Michal Valko : Research

Michal Valko, machine learning scientist at DeepMind, Inria, and a lecturer at MVA/ENS PS.

  deep reinforcement learning, representation learning, Monte-Carlo tree search, active learning, graphs, bandit theory


News: new BYOL-Explore that performs representation learning and exploration together is out!
News: new Our paper From Dirichlet to Rubin got a long oral at ICML 2022! (< 2%)
News: new Three papers accepted to ICML 2022!
News: new Congrats to Omar Darwiche Domingues for defending his thesis on March 18th, 2022!
News: new BGRL accepted to ICLR 2022!
News: new Two RL papers accepted to AISTATS 2022!
News: Apply for DeepMind Paris internships for Summer 2022.
News: Five papers accepted to NeurIPS 2021, including 1 oral and 2 spotlights!
News: Congrats to Xuedong Shang for defending his thesis on Sep 29th, 2021!
News: BraVe, self-supervised learning framework for video, accepted to ICCV 2021!
News: IXOMD invited to be presented as RL Theory seminar!
News: Our minimax SSP work invited to be presented as RL Theory seminar!
News: Six papers accepted to ICML 2021, including the long talk on UCBMQ!

older news


Michal is a machine learning scientist in DeepMind Paris, tenured researcher at Inria, and the lecturer of the master course Graphs in Machine Learning at l'ENS Paris-Saclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimizing the data that humans need to spend inspecting, classifying, or “tuning” the algorithms. That is why he is working on methods and settings that are able to deal with minimal feedback, such as deep reinforcement learning, bandit algorithms, or self-supervised learning. Michal is actively working on represenation learning and building worlds models. He is also working on deep (reinforcement) learning algorithm that have some theoretical underpinning. He has also worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. He received his Ph.D. in 2011 from the University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos before taking a permanent position at Inria in 2012.

Collaborative Projects

  • CompLACS (EU FP7) - COMposing Learning for Artificial Cognitive Systems, 2011 - 2015 (with J. Shawe-Taylor)
  • DELTA (EU CHIST-ERA) - PC - Dynamically Evolving Long-Term Autonomy, 2018 - 2021 (with A. Jonsson)
  • PGMO-IRMO grant of Fondation Mathématique Jacques Hadamard: Theoretically grounded efficient algorithms for high-dimensional and continuous reinforcement learning, 2018 - 2020 (with M. Pirotta)
  • BoB (ANR) - Bayesian statistics for expensive models and tall data, 2016 - 2020 (with R. Bardenet)
  • LeLivreScolaire.fr - Sequential Learning for Educational Systems, 2017-2020 (PI)
  • BOLD (ANR) - PI - Beyond Online Learning for better Decision making, 2019 - 2023 (with V. Perchet)
  • Allocate - PI - Adaptive allocation of resources for recommender systems with U. Potsdam, 2017 - 2019 (with A. Carpentier)
  • INTEL/Inria - PI - Algorithmic Determination of IoT Edge Analytics Requirements, 2013 - 2014
  • Extra-Learn (ANR) - PI - EXtraction and TRAnsfer of knowledge in reinforcement LEARNing, 2014 - 2018 (with A. Lazaric) Lampada (ANR) - Learning Algorithms, Models an sPArse representations for structured DAta
  • NIH/NIGMS - R01 - Detecting deviations in clinical care in ICU data streams
  • NIH/NLM - R01 - Using medical records repositories to improve the alert system design
  • Inria/CWI – Sequential prediction & Understanding Deep RL, postdoc funding (PC, 2016-2018)
  • EduBand - coPI - Educational Bandits project with Carnegie Mellon, 2015 - 2018 (with A. Lazaric and E. Brunskill)

Students and postdocs

  • David Cheikhi, 2020 - 2021, Columbia Universitu, NYC/École Polytechnique, Paris, with Pierre Ménard
  • Robert Müller, 2020, Technical University of Munich, M2 student, with Pierre Ménard
  • Ahmed Choukarah, 2020, ENS Ulm, L3 student, with Pierre Ménard
  • Côme Fiegel, 2019, ENS Ulm, L3 student, with Victor Gabillon
  • Axel Elaldi, 2017-2018, master student, École Centrale de Lille ↝ ENS Paris-Saclay/MVA
  • Xuedong Shang, 2017, master student, ENS Rennes, with Emilie Kaufmann ↝ Inria
  • Guillaume Gautier, 2016, master student, École Normale Supérieure, Paris-Saclay, with Rémi Bardenet ↝ Inria/CNRS
  • Andrea Locatelli, 2015-2016, ENSAM/ENS Paris-Saclay, with Alexandra Carpentier ↝ Universität Potsdam
  • Souhail Toumdi, 2015 - 2016, master student, École Centrale de Lille, with Rémi Bardenet ↝ ENS Paris-Saclay/MVA
  • Akram Erraqabi, 2015, master student, École Polytechnique, Paris ↝ Université de Montréal
  • Mastane Achab, 2015, master student, École Polytechnique, Paris, with G. Neu ↝ l'ENS PS ↝ TPT ↝ UPF Barcelona
  • Jean-Bastien Grill, 2014, master student, École Normale Supérieure, Paris, with Rémi Munos ↝ Inria
  • Alexandre Dubus, 2012-2013, master student, Université Lille1 - Sciences et Technologies ↝ Inria
  • Karim Jedda, 2012-2013, master student, École Centrale de Lille ↝ ProSiebenSat.1
  • Alexis Wehrli, 2012-2013, master student, École Centrale de Lille ↝ ERDF


  • DeepMind Paris (bureau: FR-PAR-RDL-5-507E)
  • 8 Rue de Londres
  • 75009 Paris
  • Inria Lille - Nord Europe, equipe SequeL (bureau: A05)
  • Parc Scientifique de la Haute Borne
  • 40 avenue Halley
  • 59650 Villeneuve d'Ascq, France
  • office phone: +33 3 59 57 7801
  • Centre Borelli, ENS Paris-Saclay (bureau: vacataires)
  • 4, avenue des Sciences
  • 91190 Gif-sur-Yvette