Reinforcement learning (RL), a branch of machine learning that trains agents through rewards and penalties, has long demonstrated prowess in complex domains such as chess and StarCraft II. More recently, its reach has extended into computational chemistry, where it is proving effective in the demanding task of conformer generation. Conformer generation seeks to identify diverse, low-energy three-dimensional arrangements of a molecule—critical for applications ranging from drug discovery to quantitative structure–activity relationship modeling.

The complexity of this task stems from the exponential growth in possible conformations as the number of rotatable bonds increases. Traditional methods such as molecular dynamics (MD) or self-guided molecular dynamics (SGMD) can require days to sample conformers for molecules with more than 20 rotatable bonds. In contrast, work by Gogineni et al. showed that framing conformer generation as an RL problem yields models capable of producing diverse, low-energy conformers far more efficiently. Using a metric that combines conformer energy with diversity—implemented in Conformer-RL as the Boltzmann Factor Reward—the RL approach outperformed SGMD and MD in tests on an eight-monomer lignin molecule, sampling ten times fewer conformers while consuming less than one percent of the CPU runtime.
Conformer-RL is a modular Python library designed to make deep RL accessible for conformer generation without requiring extensive RL or programming expertise. Built on PyTorch for deep learning and RDKit for cheminformatics, it allows users to train agents by providing only a molecule file and adjustable hyperparameters. The library supports single molecules or sequences for curriculum learning, enabling models to generalize across a class of molecules. Output includes trained models and .mol files for generated conformers, ready for downstream analysis.
The framework’s modularity invites customization: researchers can implement new agents, training algorithms, neural networks, and RL environments. This adaptability is crucial as RL techniques evolve rapidly. Conformer-RL’s environment design begins with assumptions common in conformer generation—fixed bond lengths and angles, rigid ring torsions, and discrete torsion angle buckets (defaulting to 60° increments). The RL environment represents a molecule’s conformation as its state, with actions specifying torsion angles. Each action produces a new conformer, optimized via a force field, and evaluated by a reward function.
Key components include the Action Handler, which applies torsion changes; the Reward Handler, offering schemes from basic energy-based rewards to the Boltzmann Factor Reward with torsional fingerprint distance pruning; and the Observation Handler, which converts conformers into graph representations suitable for PyTorch Geometric models. Node and edge feature extractors capture atomic and bond information, while graph normalizers ensure consistent spatial representation. Multiple environments can run in parallel to exploit multi-core systems.
Conformer-RL implements advanced RL algorithms such as advantage actor–critic (A2C) and proximal policy optimization (PPO), both policy gradient methods effective for conformer generation. Agents leverage graph neural networks tailored for molecular inputs, enabling learning across molecules of varying size and complexity.
Curriculum learning is integrated to improve training efficiency. By starting with simpler molecules and progressively increasing complexity, agents can achieve high performance faster. For example, training on lignin polymers from two to six monomers before tackling an eight-monomer target reduced training time to under a day while maintaining superior performance compared to SGMD.
Model monitoring and evaluation are supported through the TrainLogger module, which records metrics such as episode rewards and training loss, with TensorBoard integration for real-time visualization. Generalization can be assessed by periodically testing the model on a separate RL environment. The EnvLogger module captures detailed environment data, saving generated conformers as .mol files and enabling visual analysis through charts and interactive 3D renderings.
In practice, these capabilities allow Conformer-RL to serve both as a ready-to-use tool for conformer generation and as a research platform for advancing RL methods in chemistry. Its open-source availability, maintained by the University of Michigan’s Tewari and Zimmerman groups, ensures that computational chemists and engineers alike can experiment with and build upon its foundations.
