Skip to the content.

About - Registration - Schedule - Keynote Speakers - Keynote Abstracts - Accepted Contributions - Important Dates - Call for Papers

2023-12-07: The workshop is taking place online via Zoom. You should have received the link and credentials to login via email if you registered for the event on eventbrite. If you have not gotten this information, contact us via


Recent breakthroughs in machine learning (ML) for molecules have demonstrated impressive successes, ranging from highly accurate protein structure prediction to assist the discovery of novel drug candidates and chemical synthesis planning. These achievements position molecular machine learning as a key tool for addressing pressing challenges related to drug and materials discovery. However, current machine learning - especially deep learning (DL) - methods, still face significant challenges and limitations. DL models are (I) data hungry while data in the molecular sciences are often rather sparse, (II) struggle with adaptability to changing tasks or distributions, (III) lack of (properly) incorporating domain knowledge, (IV) missing explainability and inherent differentiation of causation from correlation. Additionally, given the flood of new published methods, thorough benchmarks as well as regulations for deployments and sustainability are needed.

This workshop aims to address the current limitations and capabilities of machine learning methods for molecules by (i) critically assessing them both theoretically and in applied and industrial settings, and (ii) showcasing novel and promising approaches to accelerate molecule discovery. Moreover, we will explore how recent advancements in Large Language Models (LLMs) may have the potential to revolutionize the field.

We encourage contributions that focus on robust architectures capable of handling domain shifts, novel chemical series, and diverse types of molecules. We also welcome methods that enable quick adaptation to newly acquired data, leveraging few- and zero-shot learning approaches. Furthermore, we aim to explore novel strategies for abstracting molecule representations, empowering broader generalization capabilities. Promising directions involve developing machine learning methods for creating relevant physical abstractions to enhance molecular dynamics simulations or force fields, as well as strategies to tackle automated design-make-test-analyze cycles.

Join us at this workshop, where experts from diverse fields, including ML, molecular sciences, and LLMs, will collaborate to overcome current limitations, explore new possibilities, and chart the future of molecular machine learning. Together, we aim to accelerate the discovery of functional molecules, revolutionize drug development, secure our food supply, and drive sustainable energy conversion and storage, ultimately shaping a better future for humanity.


The workshop will be open to everyone without a registration fee. You can register here!


Fri, Dec. 8th 2023, 09:00 am - 6:00 pm, CET, online at Zoom.

CET Event Speakers Title
  1. Session Session Chair: Francesca Grisoni  
09:00 Opening remarks ML4Mol Chair  
09:00 Invited Talk Jan H. Jensen Using machine learning and quantum chemistry in drug discovery
09:30 Invited Talk Renana Gershoni-Poranne Exploring the Chemical Space of Polycyclic Aromatic Systems
10:00 Contributed talk Rıza Özçelik Structured State-Space Sequence Models for De Novo Drug Design
10:15 Contributed talk Elizaveta Kozlova Protein Inpainting Co-Design with ProtFill
10:30 Contributed talk Junwu Chen Molecular Hypergraph Neural Network
10:45 Contributed talk Florian Sestak VN-EGNN: Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification
11:00 Poster Session 1 (PS 1) Poster discussion at Gathertown  
12:00 Break    
  2. Session Session Chair: Andrea Volkamer  
13:00 Invited talk Rianne van der Berg Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics
13:30 Invited talk Bruno Correia Leveraging learned surface fingerprints and generative AI for small-molecule design
14:00 Invited talk Eva Nittinger Generative Drug Design with REINVENT - Possibilities and Open Challenges
14:30 Contributed talk Ilia Igashov RetroBridge: Modeling Retrosynthesis with Markov Bridges
14:45 Contributed talk Roman Joeres DataSAIL: Data Splitting Against Information Leakage
15:00 Break    
  3. Session Session Chair: Philippe Schwaller  
15:30 Invited talk Raquel Rodríguez-Pérez Advancing Drug Design with Machine Learning: Predicting Compound Properties in the Pharmaceutical Industry
16:00 Invited talk Pat Walters Benchmarking Machine Learning Models in Drug Discovery - You’re Probably Doing It Wrong
16:30 Invited talk Andrew D. White Agents for Scientific Research Over Scientific Domains
17:00 Closing remarks ML4Mol Chair  
17:05 Poster Session 2 (PS 2) Poster discussion at Gathertown  
18:00 End    

Keynote Speakers

Bruno Correia, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
Raquel Rodríguez-Pérez, Novartis Institutes for Biomedical Research, Switzerland.
Renana Gershoni-Poranne, Technion, Israel.
Jan H. Jensen, University of Copenhagen, Denmark.
Rianne van der Berg, Microsoft Research, Netherlands.
Andrew D. White, Future House, San Francisco, United States.
Eva Nittinger, AstraZeneca, Sweden.
Pat Walters, Relay Therapeutics, Cambridge, United States.

Keynote Abstracts

Renana Gershoni-Poranne - Exploring the Chemical Space of Polycyclic Aromatic Systems Polycyclic aromatic systems (PASs) are among the most prevalent and impactful classes of compounds in the natural and man-made world. Though aromatic systems have captured the fascination of chemists for almost two centuries, a general conceptual framework for understanding and predicting the structure-property relationships of polycyclic systems remains elusive. Yet, the structure-property relationships of PBHs have both conceptual and practical implications and understanding them can enable design of new functional compounds. We address this gap using a combination of computational chemistry and data science tools. We first interrogated polybenzenoid hydrocarbons using a combination of traditional computational techniques, including characterization of their aromatic character in the S0 and T1 states (described with the NICS metric), their spin density in the T1 state, and their S0—T1 energy gaps. Regularities were revealed that allowed for simple and intuitive design guidelines to be defined. To verify these guidelines in a data-driven manner, we generated a new database – the COMPAS Project and developed two types of molecular representation to enable machine- and deep-learning models to train on the new data: a) a text-based representation and b) a graph-based representation. In addition to their predictive ability, we demonstrate the interpretability of the models that is achieved when using these representations. The extracted insight in some cases confirms well-known “rules of thumb” and in other cases disproves common wisdom and sheds new light on this classical family of compounds. Finally, we implemented a generative model that design novel PASs with targeted properties in an effective and efficient manner, demonstrating the first inverse design of PASs.5
Raquel Rodríguez-Pérez - Advancing Drug Design with Machine Learning: Predicting Compound Properties in the Pharmaceutical Industry Machine learning (ML) and deep learning models have become indispensable tools for predicting compound properties, including activity but also pharmacokinetics and toxicity endpoints. These predictions play a vital role in decision-making and assist in drug design. Recently, our investigations have focused on benchmarking different training set compositions for model generation (global vs local models). We have also explored approaches to address the challenge of changing distributions in specific projects or new modalities (domain adaptation), as well as the interpretation of predictions from ML models, with a particular emphasis on explainability and uncertainty. This talk will highlight relevant applications of ML-based molecular property predictions in the pharmaceutical industry, shedding light on their significance and addressing the challenges that require further research.
Rianne van der Berg - Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics In this talk I will first briefly discuss some of the research areas that we are currently exploring in AI4Science at Microsoft Research, covering topics such as drug discovery, material generation and neural PDE solvers. Then I will dive a little deeper into recent work on the use of score-based generative modeling for coarse-graining (CG) molecular dynamics simulations. By training a diffusion model on protein structures from molecular dynamics simulations we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several small- to medium-sized protein simulations, reproducing the CG equilibrium distribution, and preserving dynamics of all-atom simulations such as protein folding events.
Eva Nittinger - Generative Drug Design with REINVENT - Possibilities and Open Challenges De novo drug design has gained increasing interest in the computer aided drug design community throughout the last few years. AstraZeneca’s inhouse developed molecular generative method REINVENT has been continuously developed and open sourced. Purely the generation of thousands of novel molecules does not display a difficult task anymore, as shown by recent discussions around relevant benchmarks for molecular generative models. The scoring and selection process, however, does. This talk will show the range of capabilities of REINVENT and discuss the still open challenges in the field that needs to be tackled.
Pat Walters - Benchmarking Machine Learning Models in Drug Discovery - You’re Probably Doing It Wrong While machine learning (ML) models have been applied to quantitative structure-activity relationships (QSAR) for more than 20 years, the field has yet to arrive at standards for benchmark evaluations. Published benchmark studies have employed a wide range of datasets, cross-validation methodologies, and evaluation metrics. While variety is important, it is essential that benchmarks provide an accurate reflection of model performance. Unfortunately, many papers that compare ML methods and/or molecular representations use highly flawed datasets and fail to employ appropriate statistical methods. Datasets considered “standards” in the field contain numerous errors which may not be apparent to non-experts. These errors compromise and may invalidate method comparisons. In addition, many papers either ignore or inappropriately apply statistical tests for comparing distributions. Reported differences between methods often evaporate when exposed to statistical scrutiny. For the field to progress, we must establish standards and develop an evaluation framework that authors, reviewers, and journal editors can use. This will require a concerted, collaborative effort between domain experts, machine learning practitioners, and statisticians. This presentation will highlight prevalent issues with published benchmarking studies and suggest a path forward.

Accepted contributions (poster)

1 Assessing the Extrapolation Capability of Template-Free Retrosynthesis Models Shuan Chen, Yousung Jung PS 1
2 TS-DiffuGen: An equivariant diffusion model for reaction transition state conformation generation Sacha Raffaud, Jeff Guo, Philippe Schwaller PS 2
3 Activity Cliffs Go Smooth: Graph Siamese Neural Networks for Molecular Activity Prediction Ghaith Mqawass, Steffen Hirte, Johannes Kirchmair, Nils Morten Kriege PS 1
4 Bayesian Optimization of Catalysts With In-context Learning Mayk Caldas Ramos, Shane Michtavy, Marc Porosoff, Andrew White PS 2
5 Inverse-design of organometallic catalysts with guided equivariant diffusion François R J Cornet, Bardi Benediktsson, Bjarke Hastrup, Arghya Bhowmik, Mikkel N. Schmidt PS 1
6 Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction Mikołaj Sacha, Michał Sadowski, Piotr Kozakowski, Ruad van Workum, Stanislaw Kamil Jastrzebski PS 2
7 Machine learning-guided high throughput nanoparticle design Derek van Tilborg, Ana Ortiz-Perez, Roy van der Meel, Francesca Grisoni, Lorenzo Albertazzi PS 1
8 Retro-fallback: retrosynthetic planning in an uncertain world Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato PS 2
9 Exploring Organic Syntheses through Natural Language Andres M Bran, Philippe Schwaller PS 2
10 Harmonic Prior Self-conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design Hannes Stark, Bowen Jing, Regina Barzilay, Tommi S. Jaakkola PS 2
11 Coherent Energy and Force Uncertainty in Deep Learning Force Fields Peter Bjørn Jørgensen, Jonas Busk, Ole Winther, Mikkel N. Schmidt PS 1
12 Transition Path Sampling with Boltzmann Generator-based MCMC Moves Michael Plainer, Hannes Stark, Charlotte Bunne, Stephan Günnemann PS 2
13 Guided docking as a data generation approach facilitates structure-based machine learning on kinases Joschka Groß, Michael Backenköhler, Verena Wolf, Andrea Volkamer PS 1
14 Automatic Generation of Mechanistic Pathways of Organic Reactions with Dual Templates Shuan Chen, Ramil Babazade, Yousung Jung PS 1
15 Unveiling the Secrets of $^1$H-NMR Spectroscopy: A Novel Approach Utilizing Attention Mechanisms Oliver Schilter, Marvin Alberts, Alain C. Vaucher, Philippe Schwaller, Teodoro Laino PS 2
16 Improved Chirality Encodings Boost Transformer-Based Stereochemical Reaction Prediction Rémi Schlama, Philippe Schwaller PS 2
17 Autoregressive Reinforcing Framework for Fragment-based Generative Model Gunwook Nam, Yousung Jung PS 1
18 Discriminator-Driven Diffusion Mechanisms for Molecular Graph Generation Gerrit Großmann PS 2
19 Genetic algorithms are strong baselines for molecule generation Austin Tripp, José Miguel Hernández-Lobato PS 1
20 Retrieval of synthesis parameters of polymer nanocomposites using LLMs Defne Circi, Ghazal Khalighinejad, Shruti Badhwar, Bhuwan Dhingra, L. Brinson PS 2
21 Graph-to-String Variational Autoencoder for Synthetic Polymer Design Gabriel Vogel, Paolo Sortino, Jana Marie Weber PS 1
22 MolSiam: Simple Siamese Self-supervised Representation Learning for Small Molecules Joshua Yao-Yu Lin, Michael Maser, Nathan C. Frey, Gabriele Scalia, Omar Mahmood, Pedro O. Pinheiro, Ji Won Park, Stephen Ra, Andrew Martin Watkins, Kyunghyun Cho PS 2
23 BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions Bojana Ranković, Philippe Schwaller PS 1

Important dates

Call for papers

We are calling for papers advancing or critically assessing molecular machine learning. Topics include (but not limited to):

Please submit your contributions on OpenReview until October 23 2023 11:59 PM UTC-0. The submissions should be in PDF and follow the NeurIPS template with a maximum of 4 pages (not including references and appendices). Please anonymize your paper since the review process is dual-anonymous.

Organizing Committee and Contact

Chairs: Michele Ceriotti, Francesca Grisoni, Philippe Schwaller, Andrea Volkamer

Organizing committee: Michael Backenköhler, Helena Brinkmann, Michele Ceriotti, Francesca Grisoni, Rıza Özçelik, Philippe Schwaller, Andrea Volkamer, Geemi Wellawatte