Accelerating open-source innovation in computer-aided drug design

OpenBioSim (www.openbiosim.org) is a not-for-profit scientific software company that was founded in 2022 to promote and sustain the use of open-source molecular simulation software in academia and industry.

Computer-aided design of chemicals for life sciences and materials applications is an established methodology that supports R&D processes in the pharmaceutical and chemical industries. The core computational chemistry algorithms that underpin processes widely adopted by the industry, often have their roots in academic research conducted decades before their industrial utility was demonstrated.

However, academic software rarely transitions directly into routine adoption by industry professionals. One of the main reasons for this is that academia primarily encourages proof of concept studies, while industry demands robust and thoroughly validated processes to integrate new computational technologies into complex R&D pipelines. Historically, this gap has been bridged by scientific software vendor firms specializing in business-to-business services. These vendors operate by licensing commercial-grade software solutions that reimplement algorithms with proven value into robust and user-friendly software. Software offerings available to support chemical R&D cover only a fraction of the methodologies prototyped in academia, as the commercial scientific software vendor sector is smaller than the pharmaceuticals and chemicals sectors. Therefore, vendors must carefully choose where to focus their internal product development efforts.

Open-source communities 

This knowledge transfer pathway is increasingly disrupted by the growing popularity and availability of open-source software libraries. The GitHub platform, which is only 15 years old, now has over a hundred million registered user accounts. Academic researchers are increasingly releasing their code on open-source public repositories. Open-source package management systems such as conda, along with the growing prevalence of the Python programming language, have lowered barriers to combining software from different sources to build prototypes and test new research ideas.

Over the past decade, computation has adopted a more central role in drug discovery, as evidenced by the rapid rise in the number of AI-driven drug discovery companies operating at the preclinical and clinical stages. Computation-driven biotech companies compete based on differentiated platforms validated by the generation of preclinical assets. This creates a demand for the integration of customized computational technologies as components of proprietary drug discovery pipelines. Adopting open-source solutions accelerates the development roadmap of these organizations.

OpenBioSim’s mission is to lower barriers for adoption of open-source research software

OpenBioSim is the result of decades of collaborative research between academia and industry by its founding members. The company operates at the interface of academia and industry, with a focus on lowering barriers for the adoption and distribution of academic research software. OpenBioSim establishes collaborative agreements that connect academics working on cutting-edge computational chemistry algorithms with computational chemistry developers in the industry, who are working on commercial computational chemistry software or computational drug discovery platforms. As a non-profit organization, OpenBioSim’s goal is to deliver benefits to multiple stakeholders, forming a community of scientists working with open-source research software.

OpenBioSim provides maintenance and support for open-source software that has demonstrated its utility beyond academia. This enables industrialists to rely on open-source components for building software pipelines for use in production environments. Additionally, OpenBioSim promotes the sharing of knowledge and benchmarks between academia and industry, thereby accelerating the scientific validation of new computational methodologies.

OpenBioSim’s projects demonstrate the potential of collaborative research software development by industry and academia 

OpenBioSim currently supports two major open-source software projects. Sire (sire.openbiosim.org) is a modular molecular modelling framework that was originally started by Christopher Woods in 2005 as a self-funded project during his transition between jobs. Sire’s functionality has evolved over time and is now primarily used as a library to manipulate representations of biomolecular systems. The main features of Sire include a rich library of readers and writers for common molecular topologies (such as Amber prm7, CHARMM psf, GROMACS top, Mol2, SDF, PDB) and trajectory file formats (such as DCD, XTC, TRR); a search engine for selecting atoms and editing molecular systems; support for molecular mechanics energy calculations; and converters to/from other open-source toolkits such as RDKit or OpenMM.

Over the past decade, Sire’s building blocks have been utilized to develop various academic research software prototypes. Examples include the Waterswap method (based on Gibbs Ensemble Absolute Binding Free Energy Calculation theory) developed at the University of Bristol, the Nautilus software for water thermodynamics analyses (based on Grid Cell Theory) developed at the University of Edinburgh, and the SOMD package (a single-topology alchemical free energy calculation engine) also developed at the University of Edinburgh.  These software prototypes are typically operated from a command-line interface by experienced computational chemists.

SOMD was integrated in 2019 as a component of the FlareTM FEP software marketed by the software vendor Cresset , allowing non-experts to run FEP calculations to routinely support drug discovery projects. The significant product development efforts by the Cresset staff led to numerous bug fixes and protocol improvements in SOMD, which, in turn, benefited the research activities of scientists working directly with the open-source code.

OpenBioSim also provides support and maintenance for the BioSimSpace library (biosimspace.openbiosim.org). BioSimSpace was initiated in 2017 as a flagship software project by the academic consortium CCPBioSim (www.ccpbiosim.ac.uk) , with funding from EPSRC. The goal of BioSimSpace was to facilitate the development of pipelines that combine simulation software and protocols independently created by different research groups.

Essentially, BioSimSpace aims to programmatically capture all the knowledge required to successfully deploy computational chemistry software. It achieves this by providing access to libraries of interoperable workflow components and protocols that abstract common biomolecular simulation tasks. These workflow components are assembled by creating wrappers around third-party open-source tools. For instance, users can request the execution of a molecular dynamics simulation on a given molecular input without having to worry about detailed simulation configuration settings or conversion between molecular topologies. BioSimSpace is distributed in a single Python environment that is compatible with all third-party dependencies, alleviating users from the burden of installing and maintaining a fragmented ecosystem of academic research software packages.

A popular application of BioSimSpace is its use as a software development environment for free energy calculations. BioSimSpace supports automated setup and analysis of alchemical free energy calculations for multiple simulation engines (currently supporting AMBER, SOMD, and GROMACS). The availability of modular and interoperable free energy calculation pipelines facilitates the work of academic developers, who often focus their efforts on optimizing specific steps of a pipeline. Recent collaborative work on absolute binding free energy calculation methodologies between the Universities of Edinburgh and Newcastle and AstraZeneca serves as an illustration of this.

Industry developers are free to write new BSS code or reuse existing BSS code alongside proprietary code to create pipelines that meet the R&D needs of their organizations. Software engineers and domain experts employed by OpenBioSim provide technical and scientific support, software maintenance efforts, and documentation to assist in these endeavours. OpenBioSim prioritizes addressing the needs of third parties that commit to financially supporting the organization.

BioSimSpace is being used by an increasing number of biotech and pharmaceutical companies across a diverse set of use cases. A notable example was recently presented by the AI-driven precision medicine company Exscientia at the MGMS MD in Pharma meeting (mdinpharma.wordpress.com) held in London in March 2023, where they described their approach to creating a platform for binding free energy calculations with the assistance of BioSimSpace. Extensive benchmarking efforts conducted by Exscientia’s staff led to the optimization of free energy calculation protocols distributed in BioSimSpace, benefiting anyone working with the toolkit.

Open-source research software has already transformed the way computational chemists in academia and industry share ideas. Looking ahead, OpenBioSim is determined to continue working on making the best emerging computational chemistry research methodologies accessible to innovators dedicated to accelerating the development of medical treatments that will bring broad societal benefits.

References

  1. Woods CJ, Malaisree M, Hannongbua S, Mulholland AJ. ‘’A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energies.’’ J Chem Phys. 2011 ;134(5):054114. doi:10.1063/1.3519057
  2. Gerogiokas G, Calabro G, Henchman RH, Southey MW, Law RJ, Michel J. ‘’Prediction of Small Molecule Hydration Thermodynamics with Grid Cell Theory.’’ J Chem Theory Comput. 2014 ;10(1):35-48. doi:10.1021/ct400783h
  3. Calabrò G, Woods CJ, Powlesland F, Mey AS, Mulholland AJ, Michel J. ‘’Elucidation of Nonadditive Effects in Protein-Ligand Binding Energies: Thrombin as a Case Study.’’ J Phys Chem B. 2016 ;120(24):5340-50. doi:10.1021/acs.jpcb.6b03296
  4. Kuhn M, Firth-Clark S, Tosco P, Mey ASJS, Mackey M, Michel J. ‘’Assessment of Binding Affinity via Alchemical Free-Energy Calculations.’’ J Chem Inf Model. 2020 ;60(6):3120-3130. doi:10.1021/acs.jcim.0c00165
  5. Hedges LO, Mey ASJS, Laughton CA, Gervasio FL, Mulholland AJ, Woods CJ, Michel J. ‘’BioSimSpace: an interoperable python framework for biomolecular simulation.’’ J. Open Source Softw. 2019 4(43), 1831 doi:10.21105/joss.01831
  6. Clark F, Robb G, Cole D, Michel J. ‘’Comparison of Receptor-Ligand Restraint Schemes for Alchemical Absolute Binding Free Energy Calculations.’’ J Chem Theory Comput. 2023 in press Bottom of Formdoi:10.1021/acs.jctc.3c00139

Written by Julien Michel.

This article appeared previously in the RSC Interest Group Chemical Information and Computer Applications Group Summer 2023 Newsletter.