This sire tutorial was developed to address a 10-minute challenge given by Julien. What could I learn from a random molecular input file using sire in just 10 minutes? And so here I am, with the URLs for two files, SYSTEM.top
and SYSTEM.crd
.
Step one was to get access to sire. That’s pretty easy now that we’ve made it available through our Jupyter notebook service at https://try.openbiosim.org. I went to this service, logged in using my GitHub username, and then started a blank Jupyter notebook. In the top cell I imported sire, following the instructions we’ve put in the quickstart guide.
import sire as sr
I’d been told that the files were in a GitHub repo, and that I could download them via the URL https://github.com/OpenBioSim/posts/raw/main/sire/001_ten_minute
. This was worth putting into a variable…
url = "https://github.com/OpenBioSim/posts/raw/main/sire/001_ten_minute"
Rather than downloading the files myself, I used the new sire.load function to download and load the files directly from a URL. I used the sire.expand function to specify multiple files at once, prefixing their names with the above url
.
mols = sr.load(sr.expand(url, "SYSTEM.top", "SYSTEM.crd"))
This has returned the loaded molecules, which I’ve put into mols
. Whenever I load something, I like to just print it to the screen to quickly see what I have got.
mols
System( name=BioSimSpace Syst num_molecules=18575 num_residues=18914 num_atoms=60695 )
Aha – so we can already see that this file contained 18,575 molecules, across 18,914 residues and 60,695 atoms. That’s interesting, but we can do so much more!
We can use search functionality to search for different molecules (and parts of molecules). A useful search term is water
. This returns all of the water molecules that have been loaded.
mols["water"]
SelectorMol( size=18459 0: Molecule( WAT:6 num_atoms=3 num_residues=1 ) 1: Molecule( WAT:7 num_atoms=3 num_residues=1 ) 2: Molecule( WAT:8 num_atoms=3 num_residues=1 ) 3: Molecule( WAT:9 num_atoms=3 num_residues=1 ) 4: Molecule( WAT:10 num_atoms=3 num_residues=1 ) ... 18454: Molecule( WAT:18460 num_atoms=3 num_residues=1 ) 18455: Molecule( WAT:18461 num_atoms=3 num_residues=1 ) 18456: Molecule( WAT:18462 num_atoms=3 num_residues=1 ) 18457: Molecule( WAT:18463 num_atoms=3 num_residues=1 ) 18458: Molecule( WAT:18464 num_atoms=3 num_residues=1 ) )
Not bad – there are quite a few! 18,459 of the 18,575 molecules are waters. So what about the rest? Another useful search term is protein
…
mols["protein"]
SelectorMol( size=3 0: Molecule( BioSimSpace Syst:3 num_atoms=1724 num_residues=114 ) 1: Molecule( BioSimSpace Syst:4 num_atoms=1724 num_residues=114 ) 2: Molecule( BioSimSpace Syst:5 num_atoms=1724 num_residues=114 ) )
Ok – three protein molecules, each of which has 1724 atoms over 114 residues. I suspect this may be a homotrimer? We can check by looking at the amino acid sequence of each protein.
proteins = mols["protein"]
seq1 = proteins[0].residues().names()
print("Protein 2 has the same sequence as 1?", seq1 == proteins[1].residues().names())
print("Protein 3 has the same sequence as 1?", seq1 == proteins[2].residues().names())
Protein 2 has the same sequence as 1? True Protein 3 has the same sequence as 1? False
Not a homotrimer? How are they different?
seq3 = proteins[2].residues().names()
for res1, res3 in zip(proteins[0].residues(), proteins[2].residues()):
if res1.name() != res3.name():
print(f"{res1.name().value()}:{res1.number().value()} is different to "
f"{res3.name().value()}:{res3.number().value()}\n")
print(":".join([x.value() for x in seq1]))
print(":".join([x.value() for x in seq3]))
HID:63 is different to HIE:291 PRT:MET:PHE:ILE:VAL:ASN:THR:ASN:VAL:PRO:ARG:ALA:SER:VAL:PRO:ASP:GLY:PHE:LEU:SER:GLU:LEU:THR: GLN:GLN:LEU:ALA:GLN:ALA:THR:GLY:LYS:PRO:PRO:GLN:TYR:ILE:ALA:VAL:HID:VAL:VAL:PRO:ASP:GLN:LEU: MET:ALA:PHE:GLY:GLY:SER:SER:GLU:PRO:CYS:ALA:LEU:CYS:SER:LEU:HID:SER:ILE:GLY:LYS:ILE:GLY:GLY: ALA:GLN:ASN:ARG:SER:TYR:SER:LYS:LEU:LEU:CYS:GLY:LEU:LEU:ALA:GLU:ARG:LEU:ARG:ILE:SER:PRO:ASP: ARG:VAL:TYR:ILE:ASN:TYR:TYR:ASP:MET:ASN:ALA:ALA:ASN:VAL:GLY:TRP:ASN:ASN:SER:THR:PHE:ALA PRT:MET:PHE:ILE:VAL:ASN:THR:ASN:VAL:PRO:ARG:ALA:SER:VAL:PRO:ASP:GLY:PHE:LEU:SER:GLU:LEU:THR: GLN:GLN:LEU:ALA:GLN:ALA:THR:GLY:LYS:PRO:PRO:GLN:TYR:ILE:ALA:VAL:HID:VAL:VAL:PRO:ASP:GLN:LEU: MET:ALA:PHE:GLY:GLY:SER:SER:GLU:PRO:CYS:ALA:LEU:CYS:SER:LEU:HIE:SER:ILE:GLY:LYS:ILE:GLY:GLY: ALA:GLN:ASN:ARG:SER:TYR:SER:LYS:LEU:LEU:CYS:GLY:LEU:LEU:ALA:GLU:ARG:LEU:ARG:ILE:SER:PRO:ASP: ARG:VAL:TYR:ILE:ASN:TYR:TYR:ASP:MET:ASN:ALA:ALA:ASN:VAL:GLY:TRP:ASN:ASN:SER:THR:PHE:ALA
Ok – it is just a different in titration state for the histidine residues (HID:63 in the first protein versus HIE:291 in the third). Yes, I think we can call this a homotrimer.
What else can we find?
Unfortunately, there isn’t an easy way to define a “ligand”. So, instead, lets look for everything that is neither a protein or water…
mols["not (protein or water)"]
SelectorMol( size=113 0: Molecule( LIG:2 num_atoms=34 num_residues=1 ) 1: Molecule( NA:18465 num_atoms=1 num_residues=1 ) 2: Molecule( NA:18466 num_atoms=1 num_residues=1 ) 3: Molecule( NA:18467 num_atoms=1 num_residues=1 ) 4: Molecule( NA:18468 num_atoms=1 num_residues=1 ) ... 108: Molecule( CL:18572 num_atoms=1 num_residues=1 ) 109: Molecule( CL:18573 num_atoms=1 num_residues=1 ) 110: Molecule( CL:18574 num_atoms=1 num_residues=1 ) 111: Molecule( CL:18575 num_atoms=1 num_residues=1 ) 112: Molecule( CL:18576 num_atoms=1 num_residues=1 ) )
It looks like we have 113 molecules. The first is called “LIG”, so it probably is the ligand. The others look like sodium and chloride ions. Just to be sure that there is only one ligand, lets look for everything that has more than one atom, and is also not protein or water…
ligand = mols["count(atoms) > 1 and not (protein or water)"]
ligand
Selector<SireMol::Atom>( size=34 0: Atom( CAB:1 [ 46.67, 46.32, 39.01] ) 1: Atom( CAC:2 [ 45.40, 46.10, 38.47] ) 2: Atom( CAD:3 [ 38.93, 47.09, 48.24] ) 3: Atom( CAE:4 [ 40.85, 45.67, 48.76] ) 4: Atom( CAF:5 [ 39.20, 47.00, 46.87] ) ... 29: Atom( H08:30 [ 47.83, 46.54, 40.80] ) 30: Atom( H09:31 [ 43.34, 46.25, 38.78] ) 31: Atom( H10:32 [ 46.89, 46.55, 43.06] ) 32: Atom( H11:33 [ 42.90, 47.23, 45.54] ) 33: Atom( H12:34 [ 39.01, 47.34, 50.61] ) )
Cool – we have a single ligand. But what does it look like? Let’s use the view function and take a look.
ligand.view()

This is a nice 3D view of the ligand, that is built using sire’s integration with nglview.
We can do more than just look at the molecule. The input files are in Amber format, so include molecular mechanics parameters for the molecules. This means we can use sire’s built-in molecular mechanics engine to calculate the energy.
ligand.energy()
37.4462 kcal mol-1
This has return the total energy of the ligand. But this energy is made up of components, such as the bond, angle and dihedral terms. We can access those too!
ligand.energy().components()
{'bond': 3.61085e-08 kcal mol-1, '1-4_LJ': 11.9829 kcal mol-1, 'dihedral': 13.224 kcal mol-1, 'intra_LJ': -2.15026 kcal mol-1, 'improper': 1.43778 kcal mol-1, 'angle': 19.6622 kcal mol-1, 'intra_coulomb': -29.7164 kcal mol-1, '1-4_coulomb': 23.0059 kcal mol-1}
How does the ligand’s energy relate to its interaction with the proteins?
To find out, let’s look to see what is around the ligand. It would be really convenient to have a ligand
search term… Fortunately, we have the power to create our own search terms via sire.search.set_token.
sr.search.set_token("ligand", "count(atoms) > 1 and not (protein or water)")
This has created the token ligand
, meaning we can now use this directly to search for ligand molecules…
ligand = mols["ligand"]
ligand
Selector<SireMol::Atom>( size=34 0: Atom( CAB:1 [ 46.67, 46.32, 39.01] ) 1: Atom( CAC:2 [ 45.40, 46.10, 38.47] ) 2: Atom( CAD:3 [ 38.93, 47.09, 48.24] ) 3: Atom( CAE:4 [ 40.85, 45.67, 48.76] ) 4: Atom( CAF:5 [ 39.20, 47.00, 46.87] ) ... 29: Atom( H08:30 [ 47.83, 46.54, 40.80] ) 30: Atom( H09:31 [ 43.34, 46.25, 38.78] ) 31: Atom( H10:32 [ 46.89, 46.55, 43.06] ) 32: Atom( H11:33 [ 42.90, 47.23, 45.54] ) 33: Atom( H12:34 [ 39.01, 47.34, 50.61] ) )
Let’s now find all the protein residues that are within 3 Å of the ligand…
residues = mols["(residues within 3 of ligand) and protein"]
residues
SireMol::SelectorM<SireMol::Residue>( size=12 0: MolNum(3) Residue( PRT:2 num_atoms=15 ) 1: MolNum(3) Residue( MET:3 num_atoms=17 ) 2: MolNum(3) Residue( LYS:33 num_atoms=22 ) 3: MolNum(3) Residue( PRO:34 num_atoms=14 ) 4: MolNum(3) Residue( TYR:37 num_atoms=21 ) ... 7: MolNum(3) Residue( MET:102 num_atoms=17 ) 8: MolNum(3) Residue( VAL:107 num_atoms=16 ) 9: MolNum(3) Residue( PHE:114 num_atoms=20 ) 10: MolNum(5) Residue( TYR:324 num_atoms=21 ) 11: MolNum(5) Residue( ASN:326 num_atoms=14 ) )
There are 12 residues. These appear to be from only two of the protein molecules. We can confirm by asking for the molecules that contain these residues…
residues.molecules()
SelectorMol( size=2 0: Molecule( BioSimSpace Syst:3 num_atoms=1724 num_residues=114 ) 1: Molecule( BioSimSpace Syst:5 num_atoms=1724 num_residues=114 ) )
Yep – just two of the protein chains have residues that are within 3 Å of the ligand (at least for this conformation). Let’s take a look at these residues…
residues.view()

They form a nice little pocket which I think the ligand would fit nicely into…
mols["((residues within 3 of ligand) and protein) or ligand"].view()

Yes, that does look pretty snug. But how snug?
Let’s loop over all the residues and calculate their energy with the ligand. We’ll put this into a python dictionary, in a format that will make it easy to analyse via pandas later…
data = {"residue": [], "component": [], "energy": []}
for residue in residues:
# get the name and number of each residue as an ID
resid = f"{residue.name().value()}:{residue.number().value()}"
# calculate the energy between this residue and the ligand
energy = residue.energy(ligand)
# now save the components of this residue into the dictionary above...
for component in energy.components():
data["residue"].append(resid)
data["component"].append(component)
data["energy"].append(energy[component].to(sr.units.kcal_per_mol))
# also save the total energy into the dictionary
data["residue"].append(resid)
data["component"].append("total")
data["energy"].append(energy.to(sr.units.kcal_per_mol))
I chose the dictionary format above as it makes it really easy to import this data into a pandas DataFrame.
import pandas as pd
df = pd.DataFrame.from_dict(data)
df
residue | component | energy | |
---|---|---|---|
0 | PRT:2 | coulomb | -1.871054 |
1 | PRT:2 | LJ | -3.929312 |
2 | PRT:2 | total | -5.800366 |
3 | MET:3 | coulomb | -0.177925 |
4 | MET:3 | LJ | -2.531279 |
5 | MET:3 | total | -2.709204 |
6 | LYS:33 | coulomb | -24.507839 |
7 | LYS:33 | LJ | -3.652179 |
8 | LYS:33 | total | -28.160018 |
9 | PRO:34 | coulomb | 0.388650 |
10 | PRO:34 | LJ | -2.216911 |
11 | PRO:34 | total | -1.828261 |
12 | TYR:37 | coulomb | -2.063485 |
13 | TYR:37 | LJ | -3.317952 |
14 | TYR:37 | total | -5.381437 |
15 | SER:64 | coulomb | -0.420798 |
16 | SER:64 | LJ | -2.295974 |
17 | SER:64 | total | -2.716772 |
18 | ILE:65 | coulomb | -1.177465 |
19 | ILE:65 | LJ | -3.477019 |
20 | ILE:65 | total | -4.654484 |
21 | MET:102 | coulomb | 1.280213 |
22 | MET:102 | LJ | -0.950707 |
23 | MET:102 | total | 0.329506 |
24 | VAL:107 | coulomb | 0.973719 |
25 | VAL:107 | LJ | -2.228941 |
26 | VAL:107 | total | -1.255223 |
27 | PHE:114 | coulomb | 0.466897 |
28 | PHE:114 | LJ | -2.915460 |
29 | PHE:114 | total | -2.448563 |
30 | TYR:324 | coulomb | -0.880204 |
31 | TYR:324 | LJ | -4.774365 |
32 | TYR:324 | total | -5.654569 |
33 | ASN:326 | coulomb | -11.036958 |
34 | ASN:326 | LJ | 0.832073 |
35 | ASN:326 | total | -10.204885 |
Now this is in a DataFrame, we can use the in-built plotting tools to create a bar chart of the total energies…
df[ df["component"] == "total" ].plot.bar(x="residue")
<AxesSubplot: xlabel='residue'>
It is clear that the interaction between the ligand and LYS:33 is the strongest and most favourable. This interaction will have both coulomb and Lennard Jones components…
df[ df["component"] == "LJ" ].plot.bar(x="residue")
<AxesSubplot: xlabel='residue'>
The Lennard Jones components are relatively small, but all (except for ASN:326) pretty favourable…
df[ df["component"] == "coulomb" ].plot.bar(x="residue")
<AxesSubplot: xlabel='residue'>
The coulomb components are driven by LYS:33 and ASN:326. Indeed, most of the favourable interaction energy appears to be coming from the coulomb interaction between LYS:33 and the ligand. Let’s take a look to see if we can understand why?
mols["ligand or (resname LYS and resnum 33)"].view()

Maybe there is a specific atom-atom interaction that is responsible? To check, we can loop over all pairs of atoms between the ligand and this residue to find the closest pair…
# Start by setting the closest value to a large distance...
closest = (1000 * sr.units.angstrom, None, None)
# loop over all atoms in LYS:33
for atom0 in mols["resname LYS and resnum 33"].atoms():
# and then loop over all atoms in the ligand
for atom1 in ligand:
# calculate their distance using the sr.measure function...
dist = sr.measure(atom0, atom1)
# if the distance is less than `closest`, then save this
# distance and the pair of atoms
if dist < closest[0]:
closest = (dist, atom0, atom1)
The above code uses the sire.measure function. This can be used to measure lengths, angles and torsions between pretty much anything in sire. In this case, it calculated the distance between each pair of atoms. The closest pair were saved into the variable closest
.
print(closest)
(2.40007 Å, Atom( HZ1:514 [ 40.32, 45.62, 40.87] ), Atom( NAM:18 [ 40.97, 46.16, 43.12] ))
The two closest atoms were HZ1 of LYS:33 and NAM of the ligand. We can calculate their interaction energies using the energy function again…
atom0 = closest[1]
atom1 = closest[2]
print(atom0.energy(atom1))
print(atom0.energy(atom1).components())
-12.7634 kcal mol-1 {'coulomb': -12.7279 kcal mol-1, 'LJ': -0.0355264 kcal mol-1}
This energy is a significant chunk of the residue-ligand energy.
With this, my 10 minutes are up. So what did I find?
- This is a model of a ligand bound to a large trimeric protein (a homotrimer, but one where the titratio state of a histindine is different between the first and third proteins). This is solvated in a bath of water molecules and counter ions.
- The closest residues are from two of the proteins, which form a pocket within which the ligand has bound.
- The ligand is bound most tightly to LYS:33, through a strong electrostatic interaction
- This interaction is mostly between the HZ1 atom of lysine and NAM atom of the ligand. These two atoms are separated by 2.4 Å.
Given another 10 minutes, and perhaps even a dynamics trajectory, I wonder what else I can find? But, for now, Julien, how did I do?
If you want to try this yourself, please feel free connect to try.openbiosim.org and starting a notebook. You can download the notebook used to generate this post onto the server by running this command in one of the notebook code cells.
! wget https://github.com/OpenBioSim/posts/raw/main/sire/001_ten_minute/ten_minute.ipynb
The notebook will appear in the file browser in the left pane. Click on it to open, and then execute away. Feel free to use this as a starting point for your own notebooks for 10-minute challenges on your own biomolecular systems.