BioSimSpace diaries: ABFE – Alchemical Absolute Binding Free Energy Calculations

The most popular use-case of alchemical free energy (AFE) calculations is currently relative binding free energy calculations (RBFE), which give the difference in binding free energy between a pair of ligands (typically drug-like small molecules) for a a given receptor (typically a protein structure). RBFE is most typically applied to process datasets of structurally related ligands. RBFE calculations in BioSimSpace were discussed in a previous post .

While relative binding free energy (RBFE) calculations can be very useful in drug discovery, several important problems lie outside the scope of standard RBFE calculations:

  • Calculating the relative binding free energies of structurally dissimilar ligands to a common target.
  • Calculating relative binding free energies of the same ligand to the same protein with different binding poses. This could be used to rigorously “score” different poses.
  • Calculating the relative binding free energies of the same ligand to different targets. This could be useful to optimise selectivity or promiscuity.

ABFE calculations

Alchemical absolute binding free energy (ABFE) calculations escape these limitations by following a more general thermodynamic cycle in which the ligand’s intermolecular interactions are completely turned off:

abfe thermodynamic cycle

 

We will refer to the left side of the cycle (where the ligand is in solution) as the “free leg”, and the right side of the cycle (where the ligand is bound) as the “bound leg”. The simulations represented by individual arrows will be called “stages”.

The absolute binding free energy can be obtained by adding up the terms round our free energy cycle, not forgetting any symmetry corrections (for example if there is more than 1 symmetrical binding site per protein):

ABFE equation

ABFE calculations require receptor-ligand restraints (shown by the red dotted lines above) to prevent sampling issues. To obtain converged free energies of binding without restraints, we would have to be sure that the ligand was sampling outside the binding site as soon as the unbound state became comparable in free energy to the bound state, and that we were correctly estimating the ratios of the sizes of the simulation box to the binding site; in practice, this is not generally feasible.

Implementing ABFE calculations with BioSimSpace

BioSimSpace currently supports ABFE calculations with the engines GROMACS and SOMD. The popular 6-degrees of freedom “Boresch” restraints are supported, as well as the multiple distance restraints methodology of Clark et al.

We will showcase this with the complex of human macrophage inhibition factor (MIF) and the ligand MIF-180.

Loading the System

Marking the Ligand to be Decoupled

We have to tell BioSimSpace the molecule for which we want to remove the intermolecular interactions, or to “decouple”. The function for doing this is stored in the same place as the tools needed to align and merge molecules for RBFE calulations, in BSS.Align.

It’s important to save the updated status of the molecule in the system object using the updateMolecule method.

We are using “decouple” in the sense of Gilson et al. to mean removal of at least the intermolecular interactions (and optionally intramolecular non-bonded interactions) of the ligand in the presence of restraints, but be aware that it can have a different meaning (see reference 61 of Mobley, Chodera, and Dill).

Selecting the Restraints

The performance of ABFE calculations can be highly dependent on the restraints used. Considering the simulations connecting states 6 and 7 (see cycle above), we would like the restraints to be weak, so that the unrestrained complex is minimally perturbed and convergence is quickly achieved. However, the fastest convergence of the simulations connecting stages 4, 5, and 6 will likely be achieved by using relatively strong restraints to restrict the configurational volume which must be sampled as much as possible. We can strike a compromise between these opposing requirements by selecting the restraints to mimic native protein-ligand interactions as closely as possible, which allows us to find the strongest possible restraints which minimally perturb the unrestrained fully interacting state.

In addition, Boresch restraints have instabilities which are most commonly encountered when any of the contiguous anchor points are close to collinear. This results in large fluctuations in the dihedral angles and the application of large forces through the dihedral restraints – this can “blow-up” the molecular dynamics integrator, causing the simulation to crash. So, we must be careful to avoid such poor selections when choosing our restraint.

BioSimSpace allows you to select Boresch restraints in an automated way, by running an unrestrained simulation of the fully interacting complex to allow the dynamics of the ligand with respect to the protein to be analysed. This is done using the BSS.FreeEnergy.RestraintSearch class, which is used to run and analyse the simulation to generate the restraints.

In a production scenario we would run an MD simulation of the complex for a few nanoseconds. For the purpose of this tutorial we provide a short (0.5 ns) precomputed trajectory that we load in BioSimSpace.

In this algorithm candidate sets of Boresch restraints are generated and the fluctuations of the associated Boresch degrees of freedom are tracked over the unrestrained simulation. The optimum Boresch degrees of freedom are then selected based on minimum configurational volume accessible to the restrained non-interacting ligand, and the force constants are calculated to mimic the native protein-ligand interactions. Likely stability of the restraints is assessed by ensuring that the restraints impose an energy penalty of at least 10 kT for unstable anchor point geometries.

We can use BSS.NoteBook.View to visualise the chosen anchor atoms for the Boresch restraints. There will be 3 atoms in the receptor, and 3 in the ligand. We can see that the 3 protein atoms are backbone atoms in residue Methionine 3, and the 3 ligand atoms are carbon atoms in the phenol ring of the ligand.

We can check what the restraint parameters are by writing them as strings compatible for the available backends.

 

Alternatively we could select the multiple distance restraints implementation by changing the keyword argument restraint_type

We can use again BSS.NoteBook.View to visualise all the receptor and ligand atoms involved in a distance restraint. Comparison with the previous visualisation makes it clear that the Boresch restraints only restrict motions of the phenolic moiety of the ligand with respect to one protein residue, whereas the multiple distance restraints apply to every heavy atom of the ligand and involve residues distributed around the ligand.

Inspecting the restraint object shows that a different set of parameters has been generated.

Setting Up The Simulations

Now that we have defined restraints we can generate inputs for our ABFE calculation. This functionality is managed by BSS.FreeEnergy.AlchemicalFreeEnergy . The cell below shows how inputs for the engine SOMD can be prepared for the restraint leg of the ABFE cycle. Here we use a lambda schedule with 4 windows only as the restraining free energy converges rapidly if the restraint parameters have been suitably chosen.

A full ABFE setup for SOMD requires setting up a discharge step (where electrostatic interactions of the ligand are decoupled), and a vanish step (where Lennard-Jones interactions of the discharged ligand are decoupled) for both the bound leg (the ligand bound to the protein), and the free leg (the ligand in solution). Refer to the thermodynamic cycle at the top of this post. Each step requires a different number of lambda windows for optimal convergence. Typically more windows are used for the vanish step than for the discharge step.

Input files can be prepared for GROMACS by changing the keyword argument engine and by specifying a ”full” perturbation type.(restraining, discharging and vanishing carried out in a single simulation). The lambda values should be supplied as a pandas data frame. For instance the bound leg can be setup with the following code:

 

 

If you want to try this out for yourself, check out our detailed ABFE tutorials by logging in to our JupyterHub server at try.openbiosim.org (GitHub account required) and following the instructions in the file TUTORIALS.txt.

This post was written  by Julien Michel by adapting materials produced by Finlay Clark.