About - Registration - Schedule - Keynote Speakers - Keynote Abstracts - Accepted Contributions - Important Dates - Call for Papers
[!IMPORTANT]
2023-12-07: The workshop is taking place online via Zoom. You should have received the link and credentials to login via email if you registered for the event on eventbrite. If you have not gotten this information, contact us via ml4molecules@ml.jku.at.
About
Recent breakthroughs in machine learning (ML) for molecules have demonstrated impressive successes, ranging from highly accurate protein structure prediction to assist the discovery of novel drug candidates and chemical synthesis planning. These achievements position molecular machine learning as a key tool for addressing pressing challenges related to drug and materials discovery. However, current machine learning - especially deep learning (DL) - methods, still face significant challenges and limitations. DL models are (I) data hungry while data in the molecular sciences are often rather sparse, (II) struggle with adaptability to changing tasks or distributions, (III) lack of (properly) incorporating domain knowledge, (IV) missing explainability and inherent differentiation of causation from correlation. Additionally, given the flood of new published methods, thorough benchmarks as well as regulations for deployments and sustainability are needed.
This workshop aims to address the current limitations and capabilities of machine learning methods for molecules by (i) critically assessing them both theoretically and in applied and industrial settings, and (ii) showcasing novel and promising approaches to accelerate molecule discovery. Moreover, we will explore how recent advancements in Large Language Models (LLMs) may have the potential to revolutionize the field.
We encourage contributions that focus on robust architectures capable of handling domain shifts, novel chemical series, and diverse types of molecules. We also welcome methods that enable quick adaptation to newly acquired data, leveraging few- and zero-shot learning approaches. Furthermore, we aim to explore novel strategies for abstracting molecule representations, empowering broader generalization capabilities. Promising directions involve developing machine learning methods for creating relevant physical abstractions to enhance molecular dynamics simulations or force fields, as well as strategies to tackle automated design-make-test-analyze cycles.
Join us at this workshop, where experts from diverse fields, including ML, molecular sciences, and LLMs, will collaborate to overcome current limitations, explore new possibilities, and chart the future of molecular machine learning. Together, we aim to accelerate the discovery of functional molecules, revolutionize drug development, secure our food supply, and drive sustainable energy conversion and storage, ultimately shaping a better future for humanity.
Registration
The workshop will be open to everyone without a registration fee. You can register here!
Schedule
Fri, Dec. 8th 2023, 09:00 am - 6:00 pm, CET, online at Zoom.
CET | Event | Speakers | Title |
---|---|---|---|
1. Session | Session Chair: Francesca Grisoni | ||
09:00 | Opening remarks | ML4Mol Chair | |
09:00 | Invited Talk | Jan H. Jensen | Using machine learning and quantum chemistry in drug discovery |
09:30 | Invited Talk | Renana Gershoni-Poranne | Exploring the Chemical Space of Polycyclic Aromatic Systems |
10:00 | Contributed talk | Rıza Özçelik | Structured State-Space Sequence Models for De Novo Drug Design |
10:15 | Contributed talk | Elizaveta Kozlova | Protein Inpainting Co-Design with ProtFill |
10:30 | Contributed talk | Junwu Chen | Molecular Hypergraph Neural Network |
10:45 | Contributed talk | Florian Sestak | VN-EGNN: Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification |
11:00 | Poster Session 1 (PS 1) | Poster discussion at Gathertown | |
12:00 | Break | ||
2. Session | Session Chair: Andrea Volkamer | ||
13:00 | Invited talk | Rianne van der Berg | Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics |
13:30 | Invited talk | Bruno Correia | Leveraging learned surface fingerprints and generative AI for small-molecule design |
14:00 | Invited talk | Eva Nittinger | Generative Drug Design with REINVENT - Possibilities and Open Challenges |
14:30 | Contributed talk | Ilia Igashov | RetroBridge: Modeling Retrosynthesis with Markov Bridges |
14:45 | Contributed talk | Roman Joeres | DataSAIL: Data Splitting Against Information Leakage |
15:00 | Break | ||
3. Session | Session Chair: Philippe Schwaller | ||
15:30 | Invited talk | Raquel Rodríguez-Pérez | Advancing Drug Design with Machine Learning: Predicting Compound Properties in the Pharmaceutical Industry |
16:00 | Invited talk | Pat Walters | Benchmarking Machine Learning Models in Drug Discovery - You’re Probably Doing It Wrong |
16:30 | Invited talk | Andrew D. White | Agents for Scientific Research Over Scientific Domains |
17:00 | Closing remarks | ML4Mol Chair | |
17:05 | Poster Session 2 (PS 2) | Poster discussion at Gathertown | |
18:00 | End |
Keynote Speakers
Bruno Correia, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. | |
Raquel Rodríguez-Pérez, Novartis Institutes for Biomedical Research, Switzerland. | |
Renana Gershoni-Poranne, Technion, Israel. | |
Jan H. Jensen, University of Copenhagen, Denmark. | |
Rianne van der Berg, Microsoft Research, Netherlands. | |
Andrew D. White, Future House, San Francisco, United States. | |
Eva Nittinger, AstraZeneca, Sweden. | |
Pat Walters, Relay Therapeutics, Cambridge, United States. |
Keynote Abstracts
Renana Gershoni-Poranne - Exploring the Chemical Space of Polycyclic Aromatic Systems
Polycyclic aromatic systems (PASs) are among the most prevalent and impactful classes of compounds in the natural and man-made world. Though aromatic systems have captured the fascination of chemists for almost two centuries, a general conceptual framework for understanding and predicting the structure-property relationships of polycyclic systems remains elusive. Yet, the structure-property relationships of PBHs have both conceptual and practical implications and understanding them can enable design of new functional compounds. We address this gap using a combination of computational chemistry and data science tools. We first interrogated polybenzenoid hydrocarbons using a combination of traditional computational techniques, including characterization of their aromatic character in the S0 and T1 states (described with the NICS metric), their spin density in the T1 state, and their S0—T1 energy gaps. Regularities were revealed that allowed for simple and intuitive design guidelines to be defined. To verify these guidelines in a data-driven manner, we generated a new database – the COMPAS Project and developed two types of molecular representation to enable machine- and deep-learning models to train on the new data: a) a text-based representation and b) a graph-based representation. In addition to their predictive ability, we demonstrate the interpretability of the models that is achieved when using these representations. The extracted insight in some cases confirms well-known “rules of thumb” and in other cases disproves common wisdom and sheds new light on this classical family of compounds. Finally, we implemented a generative model that design novel PASs with targeted properties in an effective and efficient manner, demonstrating the first inverse design of PASs.5Raquel Rodríguez-Pérez - Advancing Drug Design with Machine Learning: Predicting Compound Properties in the Pharmaceutical Industry
Machine learning (ML) and deep learning models have become indispensable tools for predicting compound properties, including activity but also pharmacokinetics and toxicity endpoints. These predictions play a vital role in decision-making and assist in drug design. Recently, our investigations have focused on benchmarking different training set compositions for model generation (global vs local models). We have also explored approaches to address the challenge of changing distributions in specific projects or new modalities (domain adaptation), as well as the interpretation of predictions from ML models, with a particular emphasis on explainability and uncertainty. This talk will highlight relevant applications of ML-based molecular property predictions in the pharmaceutical industry, shedding light on their significance and addressing the challenges that require further research.Rianne van der Berg - Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics
In this talk I will first briefly discuss some of the research areas that we are currently exploring in AI4Science at Microsoft Research, covering topics such as drug discovery, material generation and neural PDE solvers. Then I will dive a little deeper into recent work on the use of score-based generative modeling for coarse-graining (CG) molecular dynamics simulations. By training a diffusion model on protein structures from molecular dynamics simulations we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several small- to medium-sized protein simulations, reproducing the CG equilibrium distribution, and preserving dynamics of all-atom simulations such as protein folding events.Eva Nittinger - Generative Drug Design with REINVENT - Possibilities and Open Challenges
De novo drug design has gained increasing interest in the computer aided drug design community throughout the last few years. AstraZeneca’s inhouse developed molecular generative method REINVENT has been continuously developed and open sourced. Purely the generation of thousands of novel molecules does not display a difficult task anymore, as shown by recent discussions around relevant benchmarks for molecular generative models. The scoring and selection process, however, does. This talk will show the range of capabilities of REINVENT and discuss the still open challenges in the field that needs to be tackled.Pat Walters - Benchmarking Machine Learning Models in Drug Discovery - You’re Probably Doing It Wrong
While machine learning (ML) models have been applied to quantitative structure-activity relationships (QSAR) for more than 20 years, the field has yet to arrive at standards for benchmark evaluations. Published benchmark studies have employed a wide range of datasets, cross-validation methodologies, and evaluation metrics. While variety is important, it is essential that benchmarks provide an accurate reflection of model performance. Unfortunately, many papers that compare ML methods and/or molecular representations use highly flawed datasets and fail to employ appropriate statistical methods. Datasets considered “standards” in the field contain numerous errors which may not be apparent to non-experts. These errors compromise and may invalidate method comparisons. In addition, many papers either ignore or inappropriately apply statistical tests for comparing distributions. Reported differences between methods often evaporate when exposed to statistical scrutiny. For the field to progress, we must establish standards and develop an evaluation framework that authors, reviewers, and journal editors can use. This will require a concerted, collaborative effort between domain experts, machine learning practitioners, and statisticians. This presentation will highlight prevalent issues with published benchmarking studies and suggest a path forward.Accepted contributions (poster)
1 | Assessing the Extrapolation Capability of Template-Free Retrosynthesis Models | Shuan Chen, Yousung Jung | PS 1 |
2 | TS-DiffuGen: An equivariant diffusion model for reaction transition state conformation generation | Sacha Raffaud, Jeff Guo, Philippe Schwaller | PS 2 |
3 | Activity Cliffs Go Smooth: Graph Siamese Neural Networks for Molecular Activity Prediction | Ghaith Mqawass, Steffen Hirte, Johannes Kirchmair, Nils Morten Kriege | PS 1 |
4 | Bayesian Optimization of Catalysts With In-context Learning | Mayk Caldas Ramos, Shane Michtavy, Marc Porosoff, Andrew White | PS 2 |
5 | Inverse-design of organometallic catalysts with guided equivariant diffusion | François R J Cornet, Bardi Benediktsson, Bjarke Hastrup, Arghya Bhowmik, Mikkel N. Schmidt | PS 1 |
6 | Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction | Mikołaj Sacha, Michał Sadowski, Piotr Kozakowski, Ruad van Workum, Stanislaw Kamil Jastrzebski | PS 2 |
7 | Machine learning-guided high throughput nanoparticle design | Derek van Tilborg, Ana Ortiz-Perez, Roy van der Meel, Francesca Grisoni, Lorenzo Albertazzi | PS 1 |
8 | Retro-fallback: retrosynthetic planning in an uncertain world | Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato | PS 2 |
9 | Exploring Organic Syntheses through Natural Language | Andres M Bran, Philippe Schwaller | PS 2 |
10 | Harmonic Prior Self-conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design | Hannes Stark, Bowen Jing, Regina Barzilay, Tommi S. Jaakkola | PS 2 |
11 | Coherent Energy and Force Uncertainty in Deep Learning Force Fields | Peter Bjørn Jørgensen, Jonas Busk, Ole Winther, Mikkel N. Schmidt | PS 1 |
12 | Transition Path Sampling with Boltzmann Generator-based MCMC Moves | Michael Plainer, Hannes Stark, Charlotte Bunne, Stephan Günnemann | PS 2 |
13 | Guided docking as a data generation approach facilitates structure-based machine learning on kinases | Joschka Groß, Michael Backenköhler, Verena Wolf, Andrea Volkamer | PS 1 |
14 | Automatic Generation of Mechanistic Pathways of Organic Reactions with Dual Templates | Shuan Chen, Ramil Babazade, Yousung Jung | PS 1 |
15 | Unveiling the Secrets of $^1$H-NMR Spectroscopy: A Novel Approach Utilizing Attention Mechanisms | Oliver Schilter, Marvin Alberts, Alain C. Vaucher, Philippe Schwaller, Teodoro Laino | PS 2 |
16 | Improved Chirality Encodings Boost Transformer-Based Stereochemical Reaction Prediction | Rémi Schlama, Philippe Schwaller | PS 2 |
17 | Autoregressive Reinforcing Framework for Fragment-based Generative Model | Gunwook Nam, Yousung Jung | PS 1 |
18 | Discriminator-Driven Diffusion Mechanisms for Molecular Graph Generation | Gerrit Großmann | PS 2 |
19 | Genetic algorithms are strong baselines for molecule generation | Austin Tripp, José Miguel Hernández-Lobato | PS 1 |
20 | Retrieval of synthesis parameters of polymer nanocomposites using LLMs | Defne Circi, Ghazal Khalighinejad, Shruti Badhwar, Bhuwan Dhingra, L. Brinson | PS 2 |
21 | Graph-to-String Variational Autoencoder for Synthetic Polymer Design | Gabriel Vogel, Paolo Sortino, Jana Marie Weber | PS 1 |
22 | MolSiam: Simple Siamese Self-supervised Representation Learning for Small Molecules | Joshua Yao-Yu Lin, Michael Maser, Nathan C. Frey, Gabriele Scalia, Omar Mahmood, Pedro O. Pinheiro, Ji Won Park, Stephen Ra, Andrew Martin Watkins, Kyunghyun Cho | PS 2 |
23 | BoChemian: Large Language Model Embeddings for Bayesian Optimization of Chemical Reactions | Bojana Ranković, Philippe Schwaller | PS 1 |
Important dates
- October 23, 2023: Deadline for submission
- Mid / End November, 2023: Author notification
- December 8, 2023: Workshop
Call for papers
We are calling for papers advancing or critically assessing molecular machine learning. Topics include (but not limited to):
- Benchmarking molecular machine learning methods
- Data-efficient learning
- Large language models in chemistry
- Model interpretability and explainability
- Interatomic potentials for molecules and materials
- Generative modeling
- Machine learning for protein engineering
- Automation of the DMTA cycle
- Chemical reactions
Please submit your contributions on OpenReview until October 23 2023 11:59 PM UTC-0. The submissions should be in PDF and follow the NeurIPS template with a maximum of 4 pages (not including references and appendices). Please anonymize your paper since the review process is dual-anonymous.
Organizing Committee and Contact
Chairs: Michele Ceriotti, Francesca Grisoni, Philippe Schwaller, Andrea Volkamer
Organizing committee: Michael Backenköhler, Helena Brinkmann, Michele Ceriotti, Francesca Grisoni, Rıza Özçelik, Philippe Schwaller, Andrea Volkamer, Geemi Wellawatte
Contact: ml4molecules@ml.jku.at