Alchemical Free Energy in Computer-Aided Drug Design¶
Alchemical Free Energy (AFE) calculations have become an important technique in computer-aided drug design. If you are relatively new to this field, you may have come across the acronym FEP in the context of protein-ligand binding affinity estimations. FEP stands for Free Energy Perturbation, which is just one way to implement an AFE calculation. For more methodological background reading, the best-practices LiveCoMs article from Mey et al. is recommended, whereas the review JCIM article fromCournia et al. et al. gives a good modern perspective on the status of AFE calculations in drug discovery.
AFE calculations have a reputation for having a steep learning curve. There’s a lot that can go wrong when setting up or analyzing AFE calculations by putting together different open-source software as components of a pipeline. BioSimSpace has made it easier for non-experts to integrate AFE calculations in python-based drug design workflows.
This post will give an overview of the AFE functionality available in BioSimSpace 2023.3, focusing on relative free energy calculations.
A simple example: alchemical transformation of ethane into methanol¶
We start with a calculation inspired by the seminal 1985 work from Jorgensen and Ravimohan on the relative hydration free energy of ethane to methanol. Here we are interested in calculating the change in free energy upon alchemically perturbing a single molecule of ethane into a molecule of methanol in the gas phase.
Let’s load first input files
import BioSimSpace as BSS
# We assume the molecules to perturb are the first molecules in each system. (Each file contains a single molecule.)
# we use [0] to select this first molecule.
ethane = BSS.IO.readMolecules("./ethane.pdb")[0]
methanol = BSS.IO.readMolecules("./methanol.pdb")[0]
It’s simple to check what the molecules look like using the View class from BSS.Notebook
BSS.Notebook.View(ethane).system()
BSS.Notebook.View(methanol).system()
Now let’s generate forcefield parameters using BSS.Parameters . Here we use the GAFF forcefield.
ethane = BSS.Parameters.gaff(ethane).getMolecule()
methanol = BSS.Parameters.gaff(methanol).getMolecule()
At this point we create a representation of a ‘morph’ molecule that can describe both ethane or methanol using a single topology representation. In this case, two of the ethane hydrogens will turn into dummy atoms and the second carbon and the 3rd hydrogen will turn into the -OH group of the methanol. In order to automatically figure out which atoms are common between ethane and methanol we can use the matchAtoms() function from BSS.Align. This will compute a maxmimum common substructure (MCS) match via RDKit and return a dictionary that maps the indices of atoms in the ethane molecule to the indices of the atoms in the methanol to which they match.
mapping = BSS.Align.matchAtoms(ethane, methanol)
# Mapping is a dictionnary mapping atom indices in ethane to those of methanol
print (mapping)
{0: 0, 1: 2, 2: 1, 3: 4, 4: 3, 6: 5}
Once we have the mapping we can align the molecules to each other using a root mean squared displacement (RMSD) metric and from the alignment we can then create a merged molecule which contains all of the single topology
information needed for the alchemical perturbation.
To visualise the mapping we can use:
BSS.Align.viewMapping(ethane, methanol, mapping)
This shows ethane, with the atoms that map to those in methanol highlighted in green. The numbers next to the atoms are their indices within the molecule (and mapping dictionary).
In order to perform an alchemical simulation we need to create a merged molecule that combines that properties of the two molecules. To do so we first need to align one molecule to the other, based on the mapping. This can be achieved using the rsmdAlign function.
As the mapping matches the atoms for ligand 0 (ethane) to ligand 1 (methanol), and we want to align ligand 1 to ligand 0 (so align the methanol to the ethane), we need to use the inverse mapping for this:
inv_mapping = {v: k for k, v in mapping.items()}
methanol_aligned = BSS.Align.rmsdAlign(methanol, ethane, inv_mapping)
We can now merge the two molecules. This will create a composite molecule containing all of the molecular properties at both end states. If the molecules are a different size, then the smaller will contain dummy atoms to represent the atoms that will appear during the perturbation. In this case, the merged methanol end state will contain two dummy atoms corresponding to the extra hydrogen atoms in the ethane molecule.
merged = BSS.Align.merge(ethane, methanol_aligned, mapping)
Next we can setup a free energy calculation Protocol to compute the free energy change for morphing ethane into methanol in vacuum using BSS.Protocol.FreeEnergy. Here we use a small number of lambda windows and a short run time to complete the calculation rapidly. We use default settings for most parameters, but tweak the frequency at which simulation data is saved to reduce the footprint of the calculation.
protocol = BSS.Protocol.FreeEnergy(runtime=0.1*BSS.Units.Time.nanosecond,
report_interval=10000,
restart_interval=100, num_lam=3)
We can now create a process to run a free energy calculation by combining the protocol with a simulation engine using BSS.FreeEnergy.Relative. Note that the interface is agnostic to the choice of the simulation engine. The same protocol and merged molecule can be used to execute the free energy calculation using different engines. Here we use the engine SOMD but GROMACS could be used by updating just one keyword argument passed to BSS.FreeEnergy.Relative. Behind the scenes BSS will translate the specified BSS.Protocol into engine specific inputs.
# if you want to use gromacs instead replace "somd" by "gromacs"
engine ="somd"
fep_vac = BSS.FreeEnergy.Relative(merged.toSystem(),
protocol,
engine=engine,
work_dir="ethane_methanol/vacuum")
# Let's run this calculation. This may take a few minutes
fep_vac.run()
fep_vac.wait()
Now that the simulation has completed we can analyse the generated trajectories to obtain
the free energy change by using Pymbar. Note again that the analysis call is agnostic to the chosen simulation engine.
pmf_vac, discard = BSS.FreeEnergy.Relative.analyse(f'ethane_methanol/vacuum')
print ("The free energy change is DG_vac = %s +/- %s"% (pmf_vac[-1][1], pmf_vac[-1][2]))
The free energy change is DG_vac = 2.5593 kcal/mol +/- 0.1151 kcal/mol
Here we have obtained a free energy change of approximately 2.5 kcal/mol for the vacuum transformation. In order to compute a relative hydration free energy it is necessary to perform the same transformation for a merged molecule solvated in a box of water molecules, and to subtract the vacuum free energy change to the solvated free energy change. If you want to check how to write code to complete a thermodynamic cycle for a relative hydration free energy calculation consult this BSS tutorial.
Something more complicated: protein-ligand relative binding free energy calculations
A common use case for AFE calculations is to compute the relative binding free energies (RBFE) of a congeneric series (usually 10-50 compounds) of ligands for a protein. This can be done with BioSimSpace by implementing code similar to that used for the ethane to methanol case. Since this involves repeated operations on a large number of ligands, it is convenient to automate the process and create higher-level front-ends for the setup and analysis operations. Here, we highlight one possible implementation that is based on the use of two separate interactive BioSimSpace Jupyter notebook scripts (blue boxes below). The first is used to set up all the necessary calculations, and the second to analyze all the completed simulations. The intermediate tasks involving running all the required MD simulations and analyzing every trajectory to generate a set of free energy changes are implemented via a set of automated scripts that deploy compute tasks across a SLURM scheduler (orange boxes).
The setup script can be run interactively to select desired simulation settings from a dropdown menu.
After the selection of simulation settings, a network of relative binding free energy calculations that spans the entire uploaded dataset is generated using BSS.Align.generateNetwork , which provides an interface to the LOMAP software. Here, this step is illustrated with a dataset of 16 TYK2 ligands from the FEP+ JACS 2015 benchmark set.
The setup completes with production of input files for processing the entire dataset on a cluster configured with a SLURM scheduler.
The interactive analysis script is used to generate scatter plots of computed and measured relative binding free energies for each edge of the above network. It is called once the simulations have completed. The figure below was obtained by processing triplicate runs performed with the SOMD engine on the above network
This data can be processed by different network analysers to produce binding free energies for the dataset with respect to an arbitrary reference point. For instance here we have used the cinnabar tool maintained by OpenFE to perform this analysis.
The above plot shows that despite a few large outliers in the scatter plot of relative binding free energies ΔΔG, the back-transformed model that predicts ΔG values is relatively robust. For applications where the experimental data is unknown, it is sufficient to generate a rank-ordered list of compounds by calculated binding free energy.
If you want to try this out for yourself, check out our detailed tutorials that provide an introduction to alchemical free energy calculations and protein-ligand relative binding free energy calculations by logging in to our JupyterHub server at try.openbiosim.org (GitHub account required) and following the instructions in the file TUTORIALS.txt.
Written by Julien Michel.