Skip to content

C++ | Low-resolution coarse-grained protein model and knowledge-based force field

License

Notifications You must be signed in to change notification settings

aedawid/surpass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SURPASS

C++ | Low-resolution coarse-grained protein model and knowledge-based force field

For an in-depth understanding of the model representation and the intricacies of the knowledge-based force field, please refer to the detailed descriptions provided in the SURPASS publication(s).

  1. A. E. Dawid, D. Gront, and A. Kolinski, SURPASS Low-Resolution Coarse-Grained Protein Modeling, J. Chem. Theory Comput. 2017, 13 (11), 5766–5779; DOI

  2. A. E. Dawid, D. Gront, and A. Kolinski, Coarse-grained modeling of the interplay between secondary structure propensities and protein fold assembly, J. Chem. Theory Comput. 2018, 14 (4), 2277-2287; DOI

  3. A. E. Badaczewska-Dawid, A. Kolinski, S. Kmiecik, Protocols for fast simulations of protein structure flexibility using CABS-flex and SURPASS, Protein Structure Prediction. Methods in Molecular Biology. 2020, 2165, 337-353; DOI


NOTE: Parts of the source code in src/core were adapted from the BioShell structural biology library, the birthplace of the SURPASS model. This (aedawid/surpass) repository contains just the essential source code you need for SURPASS (compilation time: ~1m30s), preserving the model in its initial form as detailed in the original publications and ensuring the performance originally reported (2017-2018).

WARNING: Please be mindful that the surpass code in the BioShell library has branched off in a different direction and should be considered a separate tool now, with the understanding that its purpose, outcomes and performance may vary from the original.

Installation

Pre-requisites

Ensure that you have git, a C++ compiler like g++, cmake and make installed on your system.

  • You can install these on a Debian-based system (like Ubuntu) using:
sudo apt update
sudo apt install git g++ cmake make
  • For Red Hat-based systems (like Fedora), you can use:
sudo dnf install git gcc-c++ cmake make

Download this repo and build surpass app

  1. Navigate to the desired location

Open a terminal and navigate to the directory where you want to download this project.
Use the cd command to change directories. For example:

cd /path/to/your/desired/directory

Replace /path/to/your/desired/directory with the actual path where you want to install the software.

  1. Clone the repository

Clone the GitHub repository using git:

git clone https://github.com/aedawid/surpass.git
  1. Navigate to the project directory

Once the repository is cloned, navigate into the project directory:

cd surpass
  1. Check for installation instructions

Look for a README or INSTALL file in the directory. These files contain cu-to-date instructions for building and installing SURPASS project. You can view the file in the terminal using:

cat README.md
  1. Prepare the build environment

This project assumes using cmake for building.
(This helps in out-of-source builds, meaning the build files are separate from the source files. This is cleaner and avoids cluttering your source directory.)
Start by creating a build directory and then enter it:

mkdir build
cd build
  1. Generate build configuration

Once inside the build directory, generate the build configuration using cmake.
This will use the CMakeLists.txt file in the parent directory to configure the project.

cmake ..
  1. Compile the project

Compile the project using make. This will use the Makefile generated by cmake to compile all the necessary files and link them to create executables in the bin directory.

make
  1. Navigate to the binaries directory

After compiling the project, the executables (binaries) are located in an automatically created bin directory within the build directory. Navigate to this directory to access the compiled executables:

cd ../bin

You are now in the directory where the executables are located. You can list the contents of the directory to see the compiled binaries:

ls
  1. Verify the installation

You can now run the executables directly from the bin directory. Verify the installation with a command:

./surpass -h
  1. Cleanup (optional)

After installation, you may want to clean up the build files. You can usually do this with:

make clean

Or if you want to clear all files generated by cmake and start fresh, you can simply delete and recreate the build directory.


Set up SURPASS simulation

After navigating to the bin directory where the surpass executable is located, you can display the available options or the help message associated with it. Use a command:

./surpass -h

You should see a list of available options, usage instructions and a help message:

                            -help :print help message
                         -verbose :set the verbosity level
                     -in:database :path to parameters directory
                     -sample:seed :sets random generator seed for MC sampling
          -sample:mc_outer_cycles :the number of large MC cycles (outer MC loop) to perform
          -sample:mc_inner_cycles :the number of small MC cycles (inner MC loop) to perform
          -sample:mc_cycle_factor :make each MC cycle N times longer
          -sample::perturb::range :sets the maximum move range for a Cartesian perturbation mover
        -sample::n_perturb::range :sets the maximum move range for a Cartesian N-residues
                                   perturbation mover
            -sample::n_perturb::n :sets the number of residues (N) for a Cartesian N-residues
                                   perturbation mover
                          -in:pdb :provide an input protein structure(s) in PDB format
                   -in:pdb:native :provide the native (or reference) protein structure in PDB
                                   format
                          -in:ss2 :provide an input secondary structure in PsiPred's SS2 format
                         -out:pdb :provide an output file to write structure in PDB format
                  -out:pdb:min_en :provide an output file to write low-energy structures in PDB
                                   format
        -out:pdb:min_en::fraction :say 0.15 to record structures worse by 15% of energy than the
                                   currently lowest
           -out:pdb:min_en::value :the highest energy value for a structure to be recorded with
                                   -out:pdb:min_en option
                  -sample:t_start :initial temperature of the simulation
                    -sample:t_end :final temperature of the simulation
                  -sample:t_steps :the number of isothermal steps to make
                 -sample:replicas :temperatures for replicas in REMC simulation (the number of
                                   temperature values defines the number of replicas)
-sample:replicas:observation_mode :observation mode: ISOTHERMAL - same temperature (default);
                                   ISOTEMPORAL - contiguous time trajectory
                -sample:exchanges :the number of my_sampler exchanges

SURPASS supports various Monte Carlo (MC) sampling configurations, enabling flexible simulation setups tailored to your specific needs. You can utilize SURPASS for:

  • isothermal MC simulations, where the temperature remains constant,
  • simulated annealing, where the temperature is gradually decreased to explore energy landscapes, and
  • replica exchange (RE) MC simulations, which involve multiple replicas at different temperatures or parameters to enhance sampling efficiency.

To leverage these configurations, ensure you correctly set the simulation parameters and conditions corresponding to your chosen MC sampling method.

Minimal inputs

For any kind of simulation with SURPASS, two minimal inputs are required:

  • the starting PDB conformation in SURPASS representation and
  • the corresponding secondary structure assignment in PsiPred .ss2 format.

By utilizing the -in:pdb:native option, you can provide a third input,

  • a reference protein structure in PDB format, already converted to SURPASS representation, allowing for comparative analyses (e.g. calculation of RMSD for the entire pseudo trajectory).

Prepare inputs

In the bin directory, following the project compilation, you will find several additional executables designed to assist in preparing the necessary inputs for your simulation, provided you possess your protein's all-atom structure in PDB format.
WARNING: If such a structure is unavailable, for instance, if you only have the protein sequence, it is essential to generate a random chain that includes at least all heavy atoms of the amino acids present in your protein.

executable generates input application
pdb_to_fasta.cc none Reads an all-atom structure from a PDB file and produces FASTA sequence file.
surpass_representation.cc -in:pdb
-in:pdb:native
Reads (all-atom) structure from a PDB file and produces a structure in SURPASS representation.
dssp_to_ss2.cc -in:ss2 Reads the output from DSSP and produces the secondary structure assignment (in PsiPred format).

Input conformation

Depending on your simulation objectives, the choice of the starting conformation for the SURPASS simulation can vary significantly.

  • If the goal is to study the dynamics near a specific conformational state, it's advisable to use that particular state as the input.
  • Conversely, for studies focused on protein folding, it's more appropriate to commence with a denatured or unfolded conformation.
    • If you only possess the native (folded) structure, you can initially generate a random conformation by unfolding it in a SURPASS simulated annealing simulation (transitioning from low to high temperature) thereby preparing a suitable starting point for subsequent folding studies.

Input secondary structure assignment

The input secondary structure plays a pivotal role in determining the interactions during a SURPASS simulation, as it is the sole sequence-dependent information used in the SURPASS forcefield. In this model, the amino acid sequence is simplified to just three types of beads: H (helix), E (strand), and C (coil), without explicitly utilizing any other properties of the 20 standard amino acids.
Consequently, the real amino acid sequence is not explicitly recognized during the simulation, underscoring the critical importance of meticulous preparation of the secondary structure assignment. While the model has been proven to be robust to variations in secondary structure assignment, it is crucial to avoid crude errors, such as misidentifying a helix as a beta sheet or merging two shorter elements into an unphysically long one, to ensure the accuracy and reliability of the simulation results.

The assignment of secondary structure in simulations typically derives from three common sources, each applicable depending on the available data about the protein:

  • Header of the PDB File:
    If a reference PDB structure is known, the secondary structure can be extracted directly from the header of the PDB file. This information is authored by the researchers who solved the structure and often provides a reliable assignment based on experimental data.*

  • Assignment Using the DSSP Algorithm:
    For any protein conformation with known atom coordinates, the DSSP (Define Secondary Structure of Proteins) algorithm can be employed. This computational method analyzes the hydrogen bonds in the protein to assign secondary structures, making it a versatile option for a wide range of conformations.

  • Prediction from PsiPred or Similar Tools:
    When only the protein sequence is known, without any structural data, secondary structure prediction tools like PsiPred can be utilized. These tools use machine learning models trained on known protein structures to predict the likelihood of each amino acid being part of a helix, strand, or coil, providing valuable insights even in the absence of experimental structure data.

Example setup for Replica Exchange Monte Carlo simulation

./surpass -verbose=FINE \
    -in:database=./ \
    -in:pdb=start_surpass.pdb -in:ss2=secondary_structure_dssp.ss2 -in:pdb:native=reference_surpass.pdb \
    -out:pdb=tra.pdb \
    -sample:mc_outer_cycles=5 \
    -sample:mc_inner_cycles=100 \
    -sample:exchanges=100000 \
    -sample:replicas:observation_mode=1 \
    -sample:replicas=1.3,1.35,1.4,1.45,1.5,1.6,1.7,1.8,2.0,2.2,2.4,2.6

PRO TIP: To set up a Replica Exchange Monte Carlo (REMC) simulation, you'll need to carefully configure the simulation parameters for each replica, ensuring a proper distribution of temperatures (reduced temperature factor) to facilitate adequate sampling and efficient exchanges, see option -sample:replicas.

About

C++ | Low-resolution coarse-grained protein model and knowledge-based force field

Resources

License

Stars

Watchers

Forks

Packages

No packages published