✍️ Contribution period: Pradnya #628

Pradnya2203 · 2023-03-08T14:04:39Z

Pradnya2203 · 2023-03-08T14:55:05Z

Motivation Statement:

I first heard about Outreachy from a friend and was truly pleased by this idea of supporting diversity and encouraging the under-represented groups from all around the world. I am a sophomore at IIT Roorkee and am also a part of various student technical clubs related to software development and data science.

I was quite excited to know that my application was approved and while going through the projects I came across Ersilia which seemed very appealing for multiple reasons. Firstly the cause; providing medical resources to under-developed countries. I have always wanted to be help people using my skills and would be overwhelmed to contribute for such a cause. Secondly the tech-stack used suits me and would help me in my future goals to pursue a career in data science.

I have worked with various languages like python, Javascript, C++, PHP, MATLab and would like to get a strong hold on python during this internship period.

Ersilia will be a great opportunity to improve my skills as well as work for the betterment of society. I am really looking forward to contribute in this project and also learn a lot in the process.

GemmaTuron · 2023-03-08T15:41:27Z

Hi @Pradnya2203

Thanks for your interest and welcome to Ersilia! Please, if you have successfully installed Ersilia and run a test model, report it here and also let us know which system are you using.
Thanks!

Pradnya2203 · 2023-03-08T16:21:56Z

Hey
I am using ubuntu 22.04 and did run the sample model. We are supposed to fork the repository and then start contributing right?

GemmaTuron · 2023-03-09T08:37:18Z

Hi @Pradnya2203 !

Please read the guidelines for the contribution period. This time around in order to be able to better provide support to all applicants we have set up a set of defined tasks to be completed each week.
https://ersilia.gitbook.io/ersilia-book/contributors/internships/outreachy-summer-2023

In addition, we will be handing out specific tasks to interns as soon as we know everyone is set up

GemmaTuron · 2023-03-09T09:05:44Z

Hi @Pradnya2203

As you will see in issue #343 this model seems to present some issues at fetch time.
Please can you test it both using the CLI and the Google Colab template (use the template provided in /notebooks), report if it is working in either of the systems and the log files.
When fetching the model, please collect the log files and try to identify the source of the error, if there are any.

Thanks!

Pradnya2203 · 2023-03-10T07:21:58Z

eos3ae_error.log

I don't exactly know why am I getting this error "ModuleNotFoundError: No module named 'yaml' " I tried installing pyyaml but didn't change anything, I'll try to solve it though
Tested using Google Colab template as well but the model still doesn't work

AhmedYusuff · 2023-03-11T21:07:17Z

Hi @Pradnya2203. From your error log ('Connection aborted.', OSError(0, 'Error')) . This looks like your connection was abandoned by the Host. Probably due to a system Error from your end.

I also tried Fetching the Model on Ubuntu 22.04, but i had to terminate the process because it was taking too long.

neww.log

Pradnya2203 · 2023-03-12T07:38:06Z

Hey @AhmedYusuff, I was not actually facing that error, was able to get around with that one but I uploaded the old log file by mistake. I have now uploaded the now log file. Thanks a lot :)

AhmedYusuff · 2023-03-12T13:37:05Z

You are welcome @Pradnya2203.

In your Log file I can see your model failed when it tried to import yaml ModuleNotFoundError: No module named 'yaml'

You can use pip show pyyaml to see if you have yaml installed on your system.

Pradnya2203 · 2023-03-12T15:45:55Z

Yes I have tried that as well @AhmedYusuff

GemmaTuron · 2023-03-13T07:59:19Z

Hi @Pradnya2203

Important: did you activate the conda environment of the model to install yaml? you should first:
conda activate eos3ae
and then
pip show pyyaml

Pradnya2203 · 2023-03-13T13:05:06Z

Hey @GemmaTuron
I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.

GemmaTuron · 2023-03-13T21:33:51Z

Hey @GemmaTuron I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.

Hi @Pradnya2203 !
thanks, I'd suggest first focusing on week 2 tasks and if those are completed on time, then we'll tackle the extra tasks assigned to you :)

Pradnya2203 · 2023-03-14T18:10:49Z

The model I chose for week 2 was Smiles To IUPAC Translator. This model was particularly interesting to me as it converts a simplified representation of a molecule (SMILES) into a standardized format for naming chemical compounds (IUPAC). This type of translator would be extremely useful in the field of drug discovery, where understanding the chemical structure of molecules is crucial for developing new drugs.
By being able to accurately translate SMILES into IUPAC, researchers can obtain important information about a molecule's properties. This information is essential for identifying potential drug targets, predicting how a molecule will interact with other compounds in the body, and designing new drug molecules that can better target specific diseases.

Pradnya2203 · 2023-03-14T18:13:20Z

I was able to fetch and serve it from the Ersilia Model Hub and get the following output

    "input": {
        "key": "POLCUAVZOMRGSN-UHFFFAOYSA-N",
        "input": "CCCOCCC",
        "text": "CCCOCCC"
    },
    "output": {
        "outcome": [
            "1-propoxypropane"
        ]
    }
}

Pradnya2203 · 2023-03-14T18:24:25Z

I than tried to actually install and run the original open source model which is https://github.com/Kohulan/Smiles-TO-iUpac-Translator#simple-usage
To run the model I created a new file app.py which had the following code


from STOUT import translate_forward, translate_reverse

# SMILES to IUPAC name translation

SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print("IUPAC name of "+SMILES+" is: "+IUPAC_name)

# IUPAC name to SMILES translation

IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print("SMILES of "+IUPAC_name+" is: "+SMILES)

I edited this file to take input as "1-propoxypropane" and got the following result

SMILES of 1-propoxypropane is: CCCOCCC.CCCOCCC

I ran into certain issues, initially I couldn't figure out how to actually run it and when I did I got an error that "[Errno 0] JVM DLL not found"
Solved this error using sudo apt install default-jre

Pradnya2203 · 2023-03-14T20:06:37Z

After running the model I used the given dataset to get the output. To use the dataset I first filtered out the IUPAC names of the molecules and created an array of strings and used a for loop to iterate and run the model on all the IUPAC names. I got the following output
translate_reverse.txt

Pradnya2203 · 2023-03-14T23:28:44Z

STOUT model has two functionalities. They are: translate_forward and translate_reverse. translate_forward converts the SMILES to IUPAC and conversely translate_reverse converts IUPAC to SMILES. In the above comment it can be seen that translate reverse has been used. Now we will use translate_forward using can_smiles from the given dataset and get the following output.
translate_forward.txt

GemmaTuron · 2023-03-15T08:58:58Z

Hi @Pradnya2203

Great, thanks for this work!
Can I ask you as extra task to install the NCATS models (use the development branch of the repo) and test out the Human Cytosolic Stability model? @pauline-banye did a lot of work in the previous internship to implement the different NCATS models and I want to make sure those are all working :)

Many thanks!

Pradnya2203 · 2023-03-15T12:25:20Z

The last step was to run the model on Ersilia Model Hub on the dataset. For that I fetched and served the model: "STOUT: SMILES to IUPAC name translator". Now for the model to iterate over the entire dataset which is https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv, I first processed the data and chose the can_smiles column as my input. For that I made a bash script which ran on my CLI and gave the following output.
output file:
ersilia_output.txt
The bash script was :

#!/bin/bash
s = ()
ersilia serve smiles2iupac
for n in ${s[@]}; 
do
    ersilia api -i $n
done

Here s contained the whole array of strings which was can_smiles

Pradnya2203 · 2023-03-15T12:43:48Z

The two outputs of the Smiles To IUPAC Translator by using original source code and ersilia model hub gives following results posted above. On comparing the two we can see the following results:
For example for the input Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
we get the output as IUPAC name of Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is: [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol
for original source code and

    "input": {
        "key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
        "input": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1",
        "text": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"
    },
    "output": {
        "outcome": [
            "[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
        ]
    }
}

for the ersilia model hub code

We can see that the output of the two matches. Similarly we can check for other inputs as well using the files posted above

Pradnya2203 · 2023-03-15T12:56:26Z

Problems I ran into while running the model on both original source code and using ersilia model hub:

It was time consuming. The dataset was large and iterating through it took a huge amount of time. Especially running the bash script.
I got quite a few of the connection errors due to change of network which are also visible in the ersilia_output.txt file posted above
Other than this, the model did not run properly on few of the inputs given and I had to remove those because the whole code was crashing

Pradnya2203 · 2023-03-15T13:30:02Z

Hey @GemmaTuron,
I have completed the week 2 tasks using the model Smiles To IUPAC Translator. I have documented all issues I faced during completion of the tasks and have posted the results of it as well. Apart from this model I also tried to run the NCATS model but was unable to setup the conda environment for it as it took a large amount of time to setup and got an error related to pip and HTTP connection. Got the same error even after retrying and making sure that the network connection is strong enough. I will try to set it up and again and continue the task as per your instructions. Also do I need to make any changes to my task 2 submission?
Thank you

GemmaTuron · 2023-03-16T07:10:03Z

Hi @Pradnya2203

The tasks are fine, you can reach out to Masroor or Zakia who have also been working on the NCATS model. What I can suggest if you are having issues is to follow the environment.yml file manually, instead of running conda env create --prefix ./env -f environment.yml open the .yml file and install manually one by one the dependencies. This will tell you which ones are giving issues (go in order, and create a conda env with the right python version)

Pradnya2203 · 2023-03-16T15:07:30Z

Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.

Loading RLM graph convolutional neural network model
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.

Pradnya2203 · 2023-03-16T16:10:21Z

I think the issue is with accessing the models from ncat servers. On clicking any of the models I am redirected to this page

and on visiting the site mentioned I find this

GemmaTuron · 2023-03-17T07:36:06Z

Hi @Pradnya2203 !

for the local implementation, you need to make sure you download the right model and place it in the folder manually, since the models cannot be accessed from the server (they stopped maintenance apparently). Use the links provided in the development branch

emmakodes · 2023-03-17T08:06:45Z

Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.
Loading RLM graph convolutional neural network model
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.

Hello @Pradnya2203 I found a fix for this. Download the model file manually from here:

and place them in their respective directory which is inside the models directory like this:
..\ncats-adme\server\models\rlm
..\ncats-adme\server\models\pampa

then run:
python app.py

Pradnya2203 · 2023-03-17T13:46:47Z

Update: I manually downloaded the model file and placed it in the right folders and also installed the right version of every single package needed and I'm still getting the same error.

Pradnya2203 · 2023-03-18T23:06:47Z

Update: I was finally able to run the ncats-adme model after a lot of struggle. I was repeatedly getting the same error which is

Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

I tried everything, right from manually installing every package to going to the depths of the code to actually find the source of the error. Finally I realized that it was a really simple solution. There was somehow an auto-downloaded corrupt file which was the root cause of the error and just deleting it solved it. Now this might seem like a trivial issue, but I think it causes huge inconvenience as the file is auto-downloaded and the error barely tells anything about it and we keep getting UnpicklingError .

Pradnya2203 · 2023-03-22T18:40:50Z

Week 3: Model Proposal one

Model Name:

ADMET_XGBoost

Model Description:

The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. The model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, the model is ranked first in 18 tasks and top 3 in 21 tasks.

Task:

Accurate ADMET prediction

Package Dependencies:

python=3.7
rdkit
deepchem
scikit-learn
PyTDC
xgboost
mordred
gensim
tensorflow~=2.4
PubChemPy

License

GNU General Public License v3.0

Pradnya2203 · 2023-03-22T20:08:02Z

Week 3: Model Proposal two

Model Name:

AI-Bind

Model Description:

Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. AI-Bind is a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. AI-Bind predicted drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. These predictions are also validated via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.

Package Dependencies:

requirements.txt

Publication:

https://paperswithcode.com/paper/ai-bind-improving-binding-predictions-for

Supplementary Information:

https://arxiv.org/pdf/2112.13168v5.pdf

Source Code:

https://github.com/chatterjeeayan/ai-bind

Data files:

https://zenodo.org/record/7226641

License:

MIT License

Pradnya2203 · 2023-03-22T20:32:01Z

Week 3: Model Proposal Three

Model Name:

OpenChem

Model Description:

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.

Main Features:

Modular design with unified API, modules can be easily combined with each other.
OpenChem is easy-to-use: new models are built with only configuration file.
Fast training with multi-gpu support.
Utilities for data preprocessing.
Tensorboard support.

Package Dependencies:

numpy
pyyaml
scipy
ipython
mkl
scikit-learn
six
pytest
pytest-cov

Tasks:

Classification (binary or multi-class)
Regression
Multi-task (such as N binary classification tasks)
Generative models

Publication:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00971

Supplementary Information:

https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c00971

Source Code:

https://github.com/Mariewelt/OpenChem

License:

MIT License

GemmaTuron · 2023-03-23T07:37:44Z

Hi @Pradnya2203 !

Similar to OpenMM that @samuelmaina has pointed to, OpenChem is a framework to develop models, but not a model in itself, so we could not directly incorporate it in the Hub, we should use it to train models and then incorporate those in the Hub - I don't like the fact that Nvidia GPU's are required to run OpenChem, since most computers do not have them.
But thanks for the suggestion, looking forward to your next ones!

GemmaTuron · 2023-03-23T11:27:45Z

Hi @Pradnya2203 !

Sorry, I missed the above:
ADMET_XGBoost : good catch, looks interesting but I fail to see the model checkpoints or the data to retrain the models, is any of this available?
AI-Bind: I did not know about this tool, they are intensively developing this it seems like (5 updates in Arxv thus far!). Looks like a promising approach, at this moment we cannot incorporate it in the Hub because we cannot pass protein as input and I see in the requirements it will need GPU to run (we try to avoid serving models that require NVIDIA GPU's, because most people won't have access to them) - but I'll keep an eye on the tool and see if we can use it!

GemmaTuron · 2023-03-23T14:20:53Z

@Pradnya2203 ,

As next steps,

Check if ADMET-XGBoost has the checkpoints available
Have a look at REDIAL 2000 that @emmakodes suggested, and try to see if it would be easy to run!
Start preparing the final application

Pradnya2203 · 2023-03-24T13:12:34Z

Hey @GemmaTuron,

I tried to run REDIAL 2000, it was fairly easy to run and I used their own sample dataset
sample_data.csv

and got the following results:
3CL-sample_data-consensus.csv
ACE2-sample_data-consensus.csv
AlphaLISA-sample_data-consensus.csv
CoV1-PPE_cs-sample_data-consensus.csv
CoV1-PPE-sample_data-consensus.csv
CPE-sample_data-consensus.csv
cytotox-sample_data-consensus.csv
hCYTOX-sample_data-consensus.csv
MERS-PPE_cs-sample_data-consensus.csv
MERS-PPE-sample_data-consensus.csv
TruHit-sample_data-consensus.csv

REDIAL-2020 is an open-source, open-access machine learning suite for estimating anti-SARS-CoV-2 activities from molecular structure. By leveraging data available from NCATS, eleven categorical machine learning models are developed: CPE, cytotox, AlphaLISA, TruHit, ACE2, 3CL, CoV-PPE, CoV-PPE_cs, MERS-PPE, MERS-PPE_cs and hCYTOX. These models are exposed on the REDIAL-2020 portal, and the output of a similarity search using input data as a query is provided for every submitted molecule. The top-ten most similar molecules to the query molecule from the existing COVID-19 databases, together with associated experimental data, are displayed. This allows users to evaluate the confidence of the machine learning predictions.

Pradnya2203 · 2023-03-24T13:43:01Z

With the ADMET-XGBoost, I tried running it as well, it does have the dataset available but I fail to see any checkpoints. I tried finding it on their documentation as well but was unable to.

GemmaTuron · 2023-03-24T14:32:44Z

Is REDIAL running on a webserver or you have access to the model checkpoints? If the latter, we could try to incorporate it in the hub!

Pradnya2203 · 2023-03-24T15:09:48Z

REDIAL is not running on a webserver, it is hosted on a website though which is http://drugcentral.org/Redial.
We do have the access to the model checkpoints though.

GemmaTuron · 2023-03-24T15:14:29Z

Hi @Pradnya2203

That's great, did you run it through the webserver or did you install the model? could you try with downloading the checkpoints and running predictions if you didn't?
If you did, I think we could try to incorporate this in the hub, what do you think?

Pradnya2203 · 2023-03-24T15:26:18Z

I installed the model and then ran the predictions using the sample data file available on their repository and got the results posted above. The checkpoints(.pkl files) were installed along with the model. I think we can try to incorporate this in the hub.

GemmaTuron · 2023-03-24T15:32:08Z

cool, feel free to go ahead and open a model request issue!
Outreachy interns from the last round prepared a nice document about the whole process, which you can read in our docs: https://ersilia.gitbook.io/ersilia-book. - make sure to read this

Pradnya2203 · 2023-03-24T17:00:55Z

@GemmaTuron, I did open a model request issue, will read the documents now. Thank you

Pradnya2203 · 2023-03-24T19:23:35Z

I did find some other model suggestions as well.

Model Name:

ATC_CNN

Model Description:

Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. ATC_CNN presents a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development

Package Dependencies:

torch
numpy
pandas
tensorflow
importlib
time
utils
tensorboardX

Slug:

ATC-CNN

Publication:

https://academic.oup.com/bib/article/23/5/bbac346/6677124

Supplementary Information:

Source Code:

https://github.com/lookwei/ATC_CNN

License:

None

Pradnya2203 · 2023-03-25T00:29:23Z

Model Name:

Reinvent

Model Description:

The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. Reinvent aims to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space.

Package Dependencies:

requirements.txt

Slug:

reinvent

Publications:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00915#

Source Code:

https://github.com/MolecularAI/Reinvent

License:

Apache License 2.0

GemmaTuron · 2023-03-27T05:52:51Z

Hi @Pradnya2203 !

Thanks, can you add the ATC-CNN model to our list?
For the Reinvent, we are already using it, though it is not in the Hub due to its complexity.
Let's focus on the model incorporation

Pradnya2203 · 2023-03-27T10:52:50Z

@GemmaTuron, thanks I will now focus on model incorporation

Pradnya2203 · 2023-03-28T21:42:36Z

Hey @GemmaTuron
I tried to incorporate redial-2020 into the Ersilia Model Hub but I am facing some issues. Here are the steps I followed:

I forked the model repository (https://github.com/Pradnya2203/eos8fth.git) and cloned it in my system.
Edited the dockerfile and metadata.json with the relevant information.
Placed the relevant files in respective directories of eos8fth/model and edited main.py
I changed the paths wherever necessary and the model started to run but is now unable to store the output.

This is my main.py file (I think this needs some change)
main.txt
(copied it to a .txt file cause this doesn't support .py file)

and this is the error

Traceback (most recent call last):
  File "main.py", line 152, in <module>
    get_predictions(temp_dir, results, csv_file)
  File "main.py", line 110, in get_predictions
    features_dictn = automate(temp_dir, csv_file)
  File "main.py", line 72, in automate
    features_rdkit = fg.get_fingerprints(stand_df, k, 'rdkDes', 'dummy_split', 'dummpy_numpy_folder')
  File "/home/pradnya/eos8fth/model/framework/code/get_features.py", line 66, in get_fingerprints
    X = rdkDes_scaler.transform(X)
  File "/home/pradnya/miniconda3/envs/redial-2020/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 414, in transform
    X *= self.scale_
ValueError: operands could not be broadcast together with shapes (13,208) (200,) (13,208)

This is till where the model is running
output.txt

Pradnya2203 · 2023-03-28T21:51:41Z

I also added ATC-CNN model to the suggestions list.

Pradnya2203 · 2023-03-28T23:02:48Z

I was able to solve that error and run the model using main.py. There was an issue with the conda environment. Now I am trying to fetch it.
This is the error

Traceback (most recent call last):
  File "pack.py", line 2, in <module>
    from src.service import load_model
  File "/home/pradnya/eos/dest/eos8fth/src/service.py", line 3, in <module>
    from bentoml import BentoService, api, artifacts
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/__init__.py", line 28, in <module>
    from bentoml.service import (  # noqa: E402
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/__init__.py", line 38, in <module>
    from bentoml.service.inference_api import InferenceAPI
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/inference_api.py", line 24, in <module>
    import flask
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2' (/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/jinja2/__init__.py)

04:25:02 | DEBUG    | Activation done
04:25:02 | DEBUG    | Previous command successfully run inside eos8fth conda environment
04:25:02 | DEBUG    | Now trying to establish symlinks
04:25:02 | DEBUG    | BentoML location is None
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

expected str, bytes or os.PathLike object, not NoneType
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/pradnya/eos/current.log

I tried to install jinja2, change the version of flask and jinja2 both but am still facing this error.

This the entire log file
current.log

GemmaTuron · 2023-03-29T06:16:48Z

Hi @Pradnya2203 !

Seems that there is a versioning issue: https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2
You can also try to bump all the model to py3.8 or above

Pradnya2203 · 2023-03-30T18:59:46Z

I tried the solution posted on stackoverflow but I am still facing the same error. I tried to change the python version but I am still facing the exact same error.

samuelmaina · 2023-03-31T09:17:28Z

HI @Pradnya2203!
I have looked at your logs and you don't have Jinja 2 in your dockerfile so it won't be installed hence the error. Add it to the docker file and see if the error persists.

Pradnya2203 · 2023-03-31T11:31:44Z

Hey @samuelmaina,Thank you, I did add it in my dockerfile as well but I'm still facing the same issue.

GemmaTuron · 2023-03-31T13:07:47Z

Hi @Pradnya2203

Was the model developed in PY3.7? I would try a newer version if possible

Pradnya2203 · 2023-04-01T14:15:28Z

Yes it was developed in py 3.7 . I'll try that thanks

Pradnya2203 · 2023-04-02T11:58:22Z

Hey @GemmaTuron, I tried quite a few things (stackoverflow, changing version python,jinja2, flask and also tried to make some changes to the output csv file and dockerfile) but I get the same error everytime. Shall I make a pull request for it? You can check from your end as well. Also redial-2020 has 11 model types I have tried to output the results on only one of them. What else can I do?

GemmaTuron · 2023-04-03T05:21:54Z

Hi @Pradnya2203

Thanks for your work, let's pause it here as the contribution period is coming to an end! I'll revise the work and try to identify a solution

GemmaTuron closed this as completed Apr 4, 2023

✍️ Contribution period: Pradnya #628

✍️ Contribution period: Pradnya #628

Comments

Pradnya2203 commented Mar 8, 2023 • edited Loading

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

Pradnya2203 commented Mar 8, 2023 • edited Loading

GemmaTuron commented Mar 8, 2023

Pradnya2203 commented Mar 8, 2023

GemmaTuron commented Mar 9, 2023

GemmaTuron commented Mar 9, 2023

Pradnya2203 commented Mar 10, 2023 • edited Loading

AhmedYusuff commented Mar 11, 2023 • edited Loading

Pradnya2203 commented Mar 12, 2023

AhmedYusuff commented Mar 12, 2023 • edited Loading

Pradnya2203 commented Mar 12, 2023

GemmaTuron commented Mar 13, 2023

Pradnya2203 commented Mar 13, 2023

GemmaTuron commented Mar 13, 2023

Pradnya2203 commented Mar 14, 2023

Pradnya2203 commented Mar 14, 2023

Pradnya2203 commented Mar 14, 2023

Pradnya2203 commented Mar 14, 2023 • edited Loading

Pradnya2203 commented Mar 14, 2023 • edited Loading

GemmaTuron commented Mar 15, 2023

Pradnya2203 commented Mar 15, 2023 • edited Loading

Pradnya2203 commented Mar 15, 2023 • edited Loading

Pradnya2203 commented Mar 15, 2023

Pradnya2203 commented Mar 15, 2023

GemmaTuron commented Mar 16, 2023

Pradnya2203 commented Mar 16, 2023

Pradnya2203 commented Mar 16, 2023

GemmaTuron commented Mar 17, 2023

emmakodes commented Mar 17, 2023

Pradnya2203 commented Mar 17, 2023

Pradnya2203 commented Mar 18, 2023

Pradnya2203 commented Mar 22, 2023

Week 3: Model Proposal one

Model Name:

Model Description:

Task:

Package Dependencies:

Publication:

Supplementary Information

Source Code:

License

Pradnya2203 commented Mar 22, 2023

Week 3: Model Proposal two

Model Name:

Model Description:

Package Dependencies:

Publication:

Supplementary Information:

Source Code:

Data files:

License:

Pradnya2203 commented Mar 22, 2023 • edited Loading

Week 3: Model Proposal Three

Model Name:

Model Description:

Main Features:

Package Dependencies:

Tasks:

Publication:

Supplementary Information:

Source Code:

License:

GemmaTuron commented Mar 23, 2023

GemmaTuron commented Mar 23, 2023

GemmaTuron commented Mar 23, 2023

Pradnya2203 commented Mar 24, 2023 • edited Loading

Pradnya2203 commented Mar 24, 2023

GemmaTuron commented Mar 24, 2023

Pradnya2203 commented Mar 24, 2023

GemmaTuron commented Mar 24, 2023

Pradnya2203 commented Mar 24, 2023

GemmaTuron commented Mar 24, 2023

Pradnya2203 commented Mar 24, 2023

Pradnya2203 commented Mar 8, 2023 •

edited

Loading

Pradnya2203 commented Mar 8, 2023 •

edited

Loading

Pradnya2203 commented Mar 10, 2023 •

edited

Loading

AhmedYusuff commented Mar 11, 2023 •

edited

Loading

AhmedYusuff commented Mar 12, 2023 •

edited

Loading

Pradnya2203 commented Mar 14, 2023 •

edited

Loading

Pradnya2203 commented Mar 14, 2023 •

edited

Loading

Pradnya2203 commented Mar 15, 2023 •

edited

Loading

Pradnya2203 commented Mar 15, 2023 •

edited

Loading

Pradnya2203 commented Mar 22, 2023 •

edited

Loading

Pradnya2203 commented Mar 24, 2023 •

edited

Loading

Pradnya2203 commented Mar 24, 2023 •

edited

Loading

Pradnya2203 commented Mar 28, 2023 •

edited

Loading

Pradnya2203 commented Mar 30, 2023 •

edited

Loading

samuelmaina commented Mar 31, 2023 •

edited

Loading

Pradnya2203 commented Apr 2, 2023 •

edited

Loading