For an in-depth understanding of the model representation and the intricacies of the knowledge-based force field, please refer to the detailed descriptions provided in the SURPASS publication(s).
-
A. E. Dawid, D. Gront, and A. Kolinski, SURPASS Low-Resolution Coarse-Grained Protein Modeling, J. Chem. Theory Comput. 2017, 13 (11), 5766–5779; DOI
-
A. E. Dawid, D. Gront, and A. Kolinski, Coarse-grained modeling of the interplay between secondary structure propensities and protein fold assembly, J. Chem. Theory Comput. 2018, 14 (4), 2277-2287; DOI
-
A. E. Badaczewska-Dawid, A. Kolinski, S. Kmiecik, Protocols for fast simulations of protein structure flexibility using CABS-flex and SURPASS, Protein Structure Prediction. Methods in Molecular Biology. 2020, 2165, 337-353; DOI
NOTE: Parts of the source code in src/core
were adapted from the BioShell structural biology library, the birthplace of the SURPASS model. This (aedawid/surpass) repository contains just the essential source code you need for SURPASS (compilation time: ~1m30s), preserving the model in its initial form as detailed in the original publications and ensuring the performance originally reported (2017-2018).
WARNING: Please be mindful that the surpass code
in the BioShell library has branched off in a different direction and should be considered a separate tool now, with the understanding that its purpose, outcomes and performance may vary from the original.
Ensure that you have git
, a C++ compiler like g++
, cmake
and make
installed on your system.
- You can install these on a Debian-based system (like Ubuntu) using:
sudo apt update
sudo apt install git g++ cmake make
- For Red Hat-based systems (like Fedora), you can use:
sudo dnf install git gcc-c++ cmake make
- Navigate to the desired location
Open a terminal and navigate to the directory where you want to download this project.
Use the cd
command to change directories. For example:
cd /path/to/your/desired/directory
Replace /path/to/your/desired/directory with the actual path where you want to install the software.
- Clone the repository
Clone the GitHub repository using git
:
git clone https://github.com/aedawid/surpass.git
- Navigate to the project directory
Once the repository is cloned, navigate into the project directory:
cd surpass
- Check for installation instructions
Look for a README or INSTALL file in the directory. These files contain cu-to-date instructions for building and installing SURPASS project. You can view the file in the terminal using:
cat README.md
- Prepare the build environment
This project assumes using cmake
for building.
(This helps in out-of-source builds, meaning the build files are separate from the source files. This is cleaner and avoids cluttering your source directory.)
Start by creating a build directory and then enter it:
mkdir build
cd build
- Generate build configuration
Once inside the build directory, generate the build configuration using cmake
.
This will use the CMakeLists.txt
file in the parent directory to configure the project.
cmake ..
- Compile the project
Compile the project using make
. This will use the Makefile
generated by cmake to compile all the necessary files and link them to create executables in the bin directory.
make
- Navigate to the binaries directory
After compiling the project, the executables (binaries) are located in an automatically created bin directory within the build directory. Navigate to this directory to access the compiled executables:
cd ../bin
You are now in the directory where the executables are located. You can list the contents of the directory to see the compiled binaries:
ls
- Verify the installation
You can now run the executables directly from the bin directory. Verify the installation with a command:
./surpass -h
- Cleanup (optional)
After installation, you may want to clean up the build files. You can usually do this with:
make clean
Or if you want to clear all files generated by cmake
and start fresh, you can simply delete and recreate the build directory.
After navigating to the bin directory where the surpass
executable is located, you can display the available options or the help message associated with it. Use a command:
./surpass -h
You should see a list of available options, usage instructions and a help message:
-help :print help message
-verbose :set the verbosity level
-in:database :path to parameters directory
-sample:seed :sets random generator seed for MC sampling
-sample:mc_outer_cycles :the number of large MC cycles (outer MC loop) to perform
-sample:mc_inner_cycles :the number of small MC cycles (inner MC loop) to perform
-sample:mc_cycle_factor :make each MC cycle N times longer
-sample::perturb::range :sets the maximum move range for a Cartesian perturbation mover
-sample::n_perturb::range :sets the maximum move range for a Cartesian N-residues
perturbation mover
-sample::n_perturb::n :sets the number of residues (N) for a Cartesian N-residues
perturbation mover
-in:pdb :provide an input protein structure(s) in PDB format
-in:pdb:native :provide the native (or reference) protein structure in PDB
format
-in:ss2 :provide an input secondary structure in PsiPred's SS2 format
-out:pdb :provide an output file to write structure in PDB format
-out:pdb:min_en :provide an output file to write low-energy structures in PDB
format
-out:pdb:min_en::fraction :say 0.15 to record structures worse by 15% of energy than the
currently lowest
-out:pdb:min_en::value :the highest energy value for a structure to be recorded with
-out:pdb:min_en option
-sample:t_start :initial temperature of the simulation
-sample:t_end :final temperature of the simulation
-sample:t_steps :the number of isothermal steps to make
-sample:replicas :temperatures for replicas in REMC simulation (the number of
temperature values defines the number of replicas)
-sample:replicas:observation_mode :observation mode: ISOTHERMAL - same temperature (default);
ISOTEMPORAL - contiguous time trajectory
-sample:exchanges :the number of my_sampler exchanges
SURPASS supports various Monte Carlo (MC) sampling configurations, enabling flexible simulation setups tailored to your specific needs. You can utilize SURPASS for:
isothermal
MC simulations, where the temperature remains constant,- simulated
annealing
, where the temperature is gradually decreased to explore energy landscapes, and replica exchange
(RE) MC simulations, which involve multiple replicas at different temperatures or parameters to enhance sampling efficiency.
To leverage these configurations, ensure you correctly set the simulation parameters and conditions corresponding to your chosen MC sampling method.
For any kind of simulation with SURPASS, two minimal inputs are required:
- the starting PDB conformation in SURPASS representation and
- the corresponding secondary structure assignment in PsiPred
.ss2
format.
By utilizing the -in:pdb:native
option, you can provide a third input,
- a reference protein structure in PDB format, already converted to SURPASS representation, allowing for comparative analyses (e.g. calculation of RMSD for the entire pseudo trajectory).
In the bin directory, following the project compilation, you will find several additional executables designed to assist in preparing the necessary inputs for your simulation, provided you possess your protein's all-atom structure in PDB format.
WARNING: If such a structure is unavailable, for instance, if you only have the protein sequence, it is essential to generate a random chain that includes at least all heavy atoms of the amino acids present in your protein.
executable | generates input | application |
---|---|---|
pdb_to_fasta.cc |
none | Reads an all-atom structure from a PDB file and produces FASTA sequence file. |
surpass_representation.cc |
-in:pdb -in:pdb:native |
Reads (all-atom) structure from a PDB file and produces a structure in SURPASS representation. |
dssp_to_ss2.cc |
-in:ss2 |
Reads the output from DSSP and produces the secondary structure assignment (in PsiPred format). |
Input conformation
Depending on your simulation objectives, the choice of the starting conformation for the SURPASS simulation can vary significantly.
- If the goal is to study the dynamics near a specific conformational state, it's advisable to use that particular state as the input.
- Conversely, for studies focused on protein folding, it's more appropriate to commence with a denatured or unfolded conformation.
- If you only possess the native (folded) structure, you can initially generate a random conformation by unfolding it in a SURPASS simulated annealing simulation (transitioning from low to high temperature) thereby preparing a suitable starting point for subsequent folding studies.
Input secondary structure assignment
The input secondary structure plays a pivotal role in determining the interactions during a SURPASS simulation, as it is the sole sequence-dependent information used in the SURPASS forcefield. In this model, the amino acid sequence is simplified to just three types of beads: H (helix), E (strand), and C (coil), without explicitly utilizing any other properties of the 20 standard amino acids.
Consequently, the real amino acid sequence is not explicitly recognized during the simulation, underscoring the critical importance of meticulous preparation of the secondary structure assignment. While the model has been proven to be robust to variations in secondary structure assignment, it is crucial to avoid crude errors, such as misidentifying a helix as a beta sheet or merging two shorter elements into an unphysically long one, to ensure the accuracy and reliability of the simulation results.
The assignment of secondary structure in simulations typically derives from three common sources, each applicable depending on the available data about the protein:
-
Header of the PDB File:
If a referencePDB
structure is known, the secondary structure can be extracted directly from the header of the PDB file. This information is authored by the researchers who solved the structure and often provides a reliable assignment based on experimental data.* -
Assignment Using the DSSP Algorithm:
For any protein conformation with known atom coordinates, theDSSP
(Define Secondary Structure of Proteins) algorithm can be employed. This computational method analyzes the hydrogen bonds in the protein to assign secondary structures, making it a versatile option for a wide range of conformations. -
Prediction from PsiPred or Similar Tools:
When only the protein sequence is known, without any structural data, secondary structure prediction tools likePsiPred
can be utilized. These tools use machine learning models trained on known protein structures to predict the likelihood of each amino acid being part of a helix, strand, or coil, providing valuable insights even in the absence of experimental structure data.
./surpass -verbose=FINE \
-in:database=./ \
-in:pdb=start_surpass.pdb -in:ss2=secondary_structure_dssp.ss2 -in:pdb:native=reference_surpass.pdb \
-out:pdb=tra.pdb \
-sample:mc_outer_cycles=5 \
-sample:mc_inner_cycles=100 \
-sample:exchanges=100000 \
-sample:replicas:observation_mode=1 \
-sample:replicas=1.3,1.35,1.4,1.45,1.5,1.6,1.7,1.8,2.0,2.2,2.4,2.6
PRO TIP: To set up a Replica Exchange Monte Carlo (REMC) simulation, you'll need to carefully configure the simulation parameters for each replica, ensuring a proper distribution of temperatures (reduced temperature factor) to facilitate adequate sampling and efficient exchanges, see option -sample:replicas
.