Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✍️ Contribution period: Pradnya #628

Closed
12 of 13 tasks
Pradnya2203 opened this issue Mar 8, 2023 · 71 comments
Closed
12 of 13 tasks

✍️ Contribution period: Pradnya #628

Pradnya2203 opened this issue Mar 8, 2023 · 71 comments

Comments

@Pradnya2203
Copy link

Pradnya2203 commented Mar 8, 2023

Week 1 - Get to know the community

  • Join the communication channels
  • Open a GitHub issue (this one!)
  • Install the Ersilia Model Hub and test the simplest model
  • Write a motivation statement to work at Ersilia
  • Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

  • Select a model from the suggested list
  • Install the model in your system
  • Run predictions for the EML
  • Compare results with the Ersilia Model Hub implementation!

Week 3 - Propose new models

  • Suggest a new model and document it (1)
  • Suggest a new model and document it (2)
  • Suggest a new model and document it (3)

Week 4 - Prepare your final application

  • Submit the final application in the Outreachy website
@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 8, 2023

Motivation Statement:

I first heard about Outreachy from a friend and was truly pleased by this idea of supporting diversity and encouraging the under-represented groups from all around the world. I am a sophomore at IIT Roorkee and am also a part of various student technical clubs related to software development and data science.

I was quite excited to know that my application was approved and while going through the projects I came across Ersilia which seemed very appealing for multiple reasons. Firstly the cause; providing medical resources to under-developed countries. I have always wanted to be help people using my skills and would be overwhelmed to contribute for such a cause. Secondly the tech-stack used suits me and would help me in my future goals to pursue a career in data science.

I have worked with various languages like python, Javascript, C++, PHP, MATLab and would like to get a strong hold on python during this internship period.

Ersilia will be a great opportunity to improve my skills as well as work for the betterment of society. I am really looking forward to contribute in this project and also learn a lot in the process.

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

Thanks for your interest and welcome to Ersilia! Please, if you have successfully installed Ersilia and run a test model, report it here and also let us know which system are you using.
Thanks!

@Pradnya2203
Copy link
Author

Hey
I am using ubuntu 22.04 and did run the sample model. We are supposed to fork the repository and then start contributing right?

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

Please read the guidelines for the contribution period. This time around in order to be able to better provide support to all applicants we have set up a set of defined tasks to be completed each week.
https://ersilia.gitbook.io/ersilia-book/contributors/internships/outreachy-summer-2023

In addition, we will be handing out specific tasks to interns as soon as we know everyone is set up

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

As you will see in issue #343 this model seems to present some issues at fetch time.
Please can you test it both using the CLI and the Google Colab template (use the template provided in /notebooks), report if it is working in either of the systems and the log files.
When fetching the model, please collect the log files and try to identify the source of the error, if there are any.

Thanks!

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 10, 2023

eos3ae_error.log

I don't exactly know why am I getting this error "ModuleNotFoundError: No module named 'yaml' " I tried installing pyyaml but didn't change anything, I'll try to solve it though
Tested using Google Colab template as well but the model still doesn't work

@AhmedYusuff
Copy link

AhmedYusuff commented Mar 11, 2023

Hi @Pradnya2203. From your error log ('Connection aborted.', OSError(0, 'Error')) . This looks like your connection was abandoned by the Host. Probably due to a system Error from your end.

I also tried Fetching the Model on Ubuntu 22.04, but i had to terminate the process because it was taking too long.

neww.log

@Pradnya2203
Copy link
Author

Hey @AhmedYusuff, I was not actually facing that error, was able to get around with that one but I uploaded the old log file by mistake. I have now uploaded the now log file. Thanks a lot :)

@AhmedYusuff
Copy link

AhmedYusuff commented Mar 12, 2023

You are welcome @Pradnya2203.

In your Log file I can see your model failed when it tried to import yaml ModuleNotFoundError: No module named 'yaml'

You can use pip show pyyaml to see if you have yaml installed on your system.

@Pradnya2203
Copy link
Author

Yes I have tried that as well @AhmedYusuff

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

Important: did you activate the conda environment of the model to install yaml? you should first:
conda activate eos3ae
and then
pip show pyyaml

@Pradnya2203
Copy link
Author

Hey @GemmaTuron
I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.

@GemmaTuron
Copy link
Member

Hey @GemmaTuron I installed the module after activating the conda environment of the model, and checked it using pip show pyyaml, but I'm still getting the same error when I run the model and when I check again I see no pyyaml in the conda environment of the model. I'll try to fix it.

Hi @Pradnya2203 !
thanks, I'd suggest first focusing on week 2 tasks and if those are completed on time, then we'll tackle the extra tasks assigned to you :)

@Pradnya2203
Copy link
Author

The model I chose for week 2 was Smiles To IUPAC Translator. This model was particularly interesting to me as it converts a simplified representation of a molecule (SMILES) into a standardized format for naming chemical compounds (IUPAC). This type of translator would be extremely useful in the field of drug discovery, where understanding the chemical structure of molecules is crucial for developing new drugs.
By being able to accurately translate SMILES into IUPAC, researchers can obtain important information about a molecule's properties. This information is essential for identifying potential drug targets, predicting how a molecule will interact with other compounds in the body, and designing new drug molecules that can better target specific diseases.

@Pradnya2203
Copy link
Author

I was able to fetch and serve it from the Ersilia Model Hub and get the following output

    "input": {
        "key": "POLCUAVZOMRGSN-UHFFFAOYSA-N",
        "input": "CCCOCCC",
        "text": "CCCOCCC"
    },
    "output": {
        "outcome": [
            "1-propoxypropane"
        ]
    }
}

@Pradnya2203
Copy link
Author

I than tried to actually install and run the original open source model which is https://github.com/Kohulan/Smiles-TO-iUpac-Translator#simple-usage
To run the model I created a new file app.py which had the following code


from STOUT import translate_forward, translate_reverse

# SMILES to IUPAC name translation

SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print("IUPAC name of "+SMILES+" is: "+IUPAC_name)

# IUPAC name to SMILES translation

IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print("SMILES of "+IUPAC_name+" is: "+SMILES)

I edited this file to take input as "1-propoxypropane" and got the following result

SMILES of 1-propoxypropane is: CCCOCCC.CCCOCCC

I ran into certain issues, initially I couldn't figure out how to actually run it and when I did I got an error that "[Errno 0] JVM DLL not found"
Solved this error using sudo apt install default-jre

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 14, 2023

After running the model I used the given dataset to get the output. To use the dataset I first filtered out the IUPAC names of the molecules and created an array of strings and used a for loop to iterate and run the model on all the IUPAC names. I got the following output
translate_reverse.txt

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 14, 2023

STOUT model has two functionalities. They are: translate_forward and translate_reverse. translate_forward converts the SMILES to IUPAC and conversely translate_reverse converts IUPAC to SMILES. In the above comment it can be seen that translate reverse has been used. Now we will use translate_forward using can_smiles from the given dataset and get the following output.
translate_forward.txt

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

Great, thanks for this work!
Can I ask you as extra task to install the NCATS models (use the development branch of the repo) and test out the Human Cytosolic Stability model? @pauline-banye did a lot of work in the previous internship to implement the different NCATS models and I want to make sure those are all working :)

Many thanks!

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 15, 2023

The last step was to run the model on Ersilia Model Hub on the dataset. For that I fetched and served the model: "STOUT: SMILES to IUPAC name translator". Now for the model to iterate over the entire dataset which is https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv, I first processed the data and chose the can_smiles column as my input. For that I made a bash script which ran on my CLI and gave the following output.
output file:
ersilia_output.txt
The bash script was :

#!/bin/bash
s = ()
ersilia serve smiles2iupac
for n in ${s[@]}; 
do
    ersilia api -i $n
done

Here s contained the whole array of strings which was can_smiles

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 15, 2023

The two outputs of the Smiles To IUPAC Translator by using original source code and ersilia model hub gives following results posted above. On comparing the two we can see the following results:
For example for the input Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
we get the output as IUPAC name of Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 is: [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol
for original source code and

    "input": {
        "key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
        "input": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1",
        "text": "Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1"
    },
    "output": {
        "outcome": [
            "[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
        ]
    }
}

for the ersilia model hub code

We can see that the output of the two matches. Similarly we can check for other inputs as well using the files posted above

@Pradnya2203
Copy link
Author

Problems I ran into while running the model on both original source code and using ersilia model hub:

  • It was time consuming. The dataset was large and iterating through it took a huge amount of time. Especially running the bash script.
  • I got quite a few of the connection errors due to change of network which are also visible in the ersilia_output.txt file posted above
  • Other than this, the model did not run properly on few of the inputs given and I had to remove those because the whole code was crashing

@Pradnya2203
Copy link
Author

Hey @GemmaTuron,
I have completed the week 2 tasks using the model Smiles To IUPAC Translator. I have documented all issues I faced during completion of the tasks and have posted the results of it as well. Apart from this model I also tried to run the NCATS model but was unable to setup the conda environment for it as it took a large amount of time to setup and got an error related to pip and HTTP connection. Got the same error even after retrying and making sure that the network connection is strong enough. I will try to set it up and again and continue the task as per your instructions. Also do I need to make any changes to my task 2 submission?
Thank you

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

The tasks are fine, you can reach out to Masroor or Zakia who have also been working on the NCATS model. What I can suggest if you are having issues is to follow the environment.yml file manually, instead of running conda env create --prefix ./env -f environment.yml open the .yml file and install manually one by one the dependencies. This will tell you which ones are giving issues (go in order, and create a conda env with the right python version)

@Pradnya2203
Copy link
Author

Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.

Loading RLM graph convolutional neural network model
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.

@Pradnya2203
Copy link
Author

I think the issue is with accessing the models from ncat servers. On clicking any of the models I am redirected to this page
image
and on visiting the site mentioned I find this
image

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

for the local implementation, you need to make sure you download the right model and place it in the folder manually, since the models cannot be accessed from the server (they stopped maintenance apparently). Use the links provided in the development branch

@emmakodes
Copy link
Contributor

Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.

Loading RLM graph convolutional neural network model
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon.

Hello @Pradnya2203 I found a fix for this. Download the model file manually from here:

  1. RLM - https://opendata.ncats.nih.gov/public/adme/models/archived/rlm/gcnn_model-20230201.pt
  2. PAMPA 7.4 - https://opendata.ncats.nih.gov/public/adme/models/archived/pampa/gcnn_model-20230201.pt
  3. SOL - https://opendata.ncats.nih.gov/public/adme/models/archived/solubility/gcnn_model-20230201.pt

and place them in their respective directory which is inside the models directory like this:
..\ncats-adme\server\models\rlm
..\ncats-adme\server\models\pampa

then run:
python app.py

@Pradnya2203
Copy link
Author

Update: I manually downloaded the model file and placed it in the right folders and also installed the right version of every single package needed and I'm still getting the same error.

@Pradnya2203
Copy link
Author

Update: I was finally able to run the ncats-adme model after a lot of struggle. I was repeatedly getting the same error which is

Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from predictors.rlm.rlm_predictor import RLMPredictior
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 177, in <module>
    rlm_gcnn_scaler, rlm_gcnn_model, rlm_gcnn_model_version = load_gcnn_model()
  File "/home/pradnya/ncats-adme/server/predictors/rlm/__init__.py", line 148, in load_gcnn_model
    rlm_gcnn_scaler, _ = load_scalers(rlm_gcnn_scaler_path)
  File "/home/pradnya/ncats-adme/server/./predictors/chemprop/chemprop/utils.py", line 132, in load_scalers
    state = torch.load(path, map_location=lambda storage, loc: storage)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/pradnya/ncats-adme/server/env/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

I tried everything, right from manually installing every package to going to the depths of the code to actually find the source of the error. Finally I realized that it was a really simple solution. There was somehow an auto-downloaded corrupt file which was the root cause of the error and just deleting it solved it. Now this might seem like a trivial issue, but I think it causes huge inconvenience as the file is auto-downloaded and the error barely tells anything about it and we keep getting UnpicklingError .

@Pradnya2203
Copy link
Author

Week 3: Model Proposal one

Model Name:

ADMET_XGBoost

Model Description:

The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. The model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, the model is ranked first in 18 tasks and top 3 in 21 tasks.

Task:

Accurate ADMET prediction

Package Dependencies:

python=3.7
rdkit
deepchem
scikit-learn
PyTDC
xgboost
mordred
gensim
tensorflow~=2.4
PubChemPy

Publication:

https://paperswithcode.com/paper/accurate-admet-prediction-with-xgboost

Supplementary Information

https://arxiv.org/pdf/2204.07532v3.pdf

Source Code:

https://github.com/smu-tao-group/ADMET_XGBoost

License

GNU General Public License v3.0

@Pradnya2203
Copy link
Author

Week 3: Model Proposal two

Model Name:

AI-Bind

Model Description:

Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. AI-Bind is a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. AI-Bind predicted drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. These predictions are also validated via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.

Package Dependencies:

requirements.txt

Publication:

https://paperswithcode.com/paper/ai-bind-improving-binding-predictions-for

Supplementary Information:

https://arxiv.org/pdf/2112.13168v5.pdf

Source Code:

https://github.com/chatterjeeayan/ai-bind

Data files:

https://zenodo.org/record/7226641

License:

MIT License

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 22, 2023

Week 3: Model Proposal Three

Model Name:

OpenChem

Model Description:

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.

Main Features:

Modular design with unified API, modules can be easily combined with each other.
OpenChem is easy-to-use: new models are built with only configuration file.
Fast training with multi-gpu support.
Utilities for data preprocessing.
Tensorboard support.

Package Dependencies:

numpy
pyyaml
scipy
ipython
mkl
scikit-learn
six
pytest
pytest-cov

Tasks:

Classification (binary or multi-class)
Regression
Multi-task (such as N binary classification tasks)
Generative models

Publication:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00971

Supplementary Information:

https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c00971

Source Code:

https://github.com/Mariewelt/OpenChem

License:

MIT License

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

Similar to OpenMM that @samuelmaina has pointed to, OpenChem is a framework to develop models, but not a model in itself, so we could not directly incorporate it in the Hub, we should use it to train models and then incorporate those in the Hub - I don't like the fact that Nvidia GPU's are required to run OpenChem, since most computers do not have them.
But thanks for the suggestion, looking forward to your next ones!

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

Sorry, I missed the above:
ADMET_XGBoost : good catch, looks interesting but I fail to see the model checkpoints or the data to retrain the models, is any of this available?
AI-Bind: I did not know about this tool, they are intensively developing this it seems like (5 updates in Arxv thus far!). Looks like a promising approach, at this moment we cannot incorporate it in the Hub because we cannot pass protein as input and I see in the requirements it will need GPU to run (we try to avoid serving models that require NVIDIA GPU's, because most people won't have access to them) - but I'll keep an eye on the tool and see if we can use it!

@GemmaTuron
Copy link
Member

@Pradnya2203 ,

As next steps,

  • Check if ADMET-XGBoost has the checkpoints available
  • Have a look at REDIAL 2000 that @emmakodes suggested, and try to see if it would be easy to run!
  • Start preparing the final application

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 24, 2023

Hey @GemmaTuron,

I tried to run REDIAL 2000, it was fairly easy to run and I used their own sample dataset
sample_data.csv

and got the following results:
3CL-sample_data-consensus.csv
ACE2-sample_data-consensus.csv
AlphaLISA-sample_data-consensus.csv
CoV1-PPE_cs-sample_data-consensus.csv
CoV1-PPE-sample_data-consensus.csv
CPE-sample_data-consensus.csv
cytotox-sample_data-consensus.csv
hCYTOX-sample_data-consensus.csv
MERS-PPE_cs-sample_data-consensus.csv
MERS-PPE-sample_data-consensus.csv
TruHit-sample_data-consensus.csv

REDIAL-2020 is an open-source, open-access machine learning suite for estimating anti-SARS-CoV-2 activities from molecular structure. By leveraging data available from NCATS, eleven categorical machine learning models are developed: CPE, cytotox, AlphaLISA, TruHit, ACE2, 3CL, CoV-PPE, CoV-PPE_cs, MERS-PPE, MERS-PPE_cs and hCYTOX. These models are exposed on the REDIAL-2020 portal, and the output of a similarity search using input data as a query is provided for every submitted molecule. The top-ten most similar molecules to the query molecule from the existing COVID-19 databases, together with associated experimental data, are displayed. This allows users to evaluate the confidence of the machine learning predictions.

@Pradnya2203
Copy link
Author

With the ADMET-XGBoost, I tried running it as well, it does have the dataset available but I fail to see any checkpoints. I tried finding it on their documentation as well but was unable to.

@GemmaTuron
Copy link
Member

Is REDIAL running on a webserver or you have access to the model checkpoints? If the latter, we could try to incorporate it in the hub!

@Pradnya2203
Copy link
Author

REDIAL is not running on a webserver, it is hosted on a website though which is http://drugcentral.org/Redial.
We do have the access to the model checkpoints though.

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

That's great, did you run it through the webserver or did you install the model? could you try with downloading the checkpoints and running predictions if you didn't?
If you did, I think we could try to incorporate this in the hub, what do you think?

@Pradnya2203
Copy link
Author

I installed the model and then ran the predictions using the sample data file available on their repository and got the results posted above. The checkpoints(.pkl files) were installed along with the model. I think we can try to incorporate this in the hub.

@GemmaTuron
Copy link
Member

cool, feel free to go ahead and open a model request issue!
Outreachy interns from the last round prepared a nice document about the whole process, which you can read in our docs: https://ersilia.gitbook.io/ersilia-book. - make sure to read this

@Pradnya2203
Copy link
Author

@GemmaTuron, I did open a model request issue, will read the documents now. Thank you

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 24, 2023

I did find some other model suggestions as well.

Model Name:

ATC_CNN

Model Description:

Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. ATC_CNN presents a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development

Package Dependencies:

torch
numpy
pandas
tensorflow
importlib
time
utils
tensorboardX

Slug:

ATC-CNN

Publication:

https://academic.oup.com/bib/article/23/5/bbac346/6677124

Supplementary Information:

Source Code:

https://github.com/lookwei/ATC_CNN

License:

None

@Pradnya2203
Copy link
Author

Model Name:

Reinvent

Model Description:

The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. Reinvent aims to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space.

Package Dependencies:

requirements.txt

Slug:

reinvent

Publications:

https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00915#

Source Code:

https://github.com/MolecularAI/Reinvent

License:

Apache License 2.0

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

Thanks, can you add the ATC-CNN model to our list?
For the Reinvent, we are already using it, though it is not in the Hub due to its complexity.
Let's focus on the model incorporation

@Pradnya2203
Copy link
Author

@GemmaTuron, thanks I will now focus on model incorporation

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 28, 2023

Hey @GemmaTuron
I tried to incorporate redial-2020 into the Ersilia Model Hub but I am facing some issues. Here are the steps I followed:

  • I forked the model repository (https://github.com/Pradnya2203/eos8fth.git) and cloned it in my system.
  • Edited the dockerfile and metadata.json with the relevant information.
  • Placed the relevant files in respective directories of eos8fth/model and edited main.py
  • I changed the paths wherever necessary and the model started to run but is now unable to store the output.

This is my main.py file (I think this needs some change)
main.txt
(copied it to a .txt file cause this doesn't support .py file)

and this is the error

Traceback (most recent call last):
  File "main.py", line 152, in <module>
    get_predictions(temp_dir, results, csv_file)
  File "main.py", line 110, in get_predictions
    features_dictn = automate(temp_dir, csv_file)
  File "main.py", line 72, in automate
    features_rdkit = fg.get_fingerprints(stand_df, k, 'rdkDes', 'dummy_split', 'dummpy_numpy_folder')
  File "/home/pradnya/eos8fth/model/framework/code/get_features.py", line 66, in get_fingerprints
    X = rdkDes_scaler.transform(X)
  File "/home/pradnya/miniconda3/envs/redial-2020/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 414, in transform
    X *= self.scale_
ValueError: operands could not be broadcast together with shapes (13,208) (200,) (13,208) 

This is till where the model is running
output.txt

@Pradnya2203
Copy link
Author

I also added ATC-CNN model to the suggestions list.

@Pradnya2203
Copy link
Author

I was able to solve that error and run the model using main.py. There was an issue with the conda environment. Now I am trying to fetch it.
This is the error

Traceback (most recent call last):
  File "pack.py", line 2, in <module>
    from src.service import load_model
  File "/home/pradnya/eos/dest/eos8fth/src/service.py", line 3, in <module>
    from bentoml import BentoService, api, artifacts
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/__init__.py", line 28, in <module>
    from bentoml.service import (  # noqa: E402
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/__init__.py", line 38, in <module>
    from bentoml.service.inference_api import InferenceAPI
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/bentoml/service/inference_api.py", line 24, in <module>
    import flask
  File "/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2' (/home/pradnya/miniconda3/envs/eos8fth/lib/python3.7/site-packages/jinja2/__init__.py)

04:25:02 | DEBUG    | Activation done
04:25:02 | DEBUG    | Previous command successfully run inside eos8fth conda environment
04:25:02 | DEBUG    | Now trying to establish symlinks
04:25:02 | DEBUG    | BentoML location is None
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

expected str, bytes or os.PathLike object, not NoneType
If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/pradnya/eos/current.log

I tried to install jinja2, change the version of flask and jinja2 both but am still facing this error.

This the entire log file
current.log

@GemmaTuron
Copy link
Member

Hi @Pradnya2203 !

Seems that there is a versioning issue: https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2
You can also try to bump all the model to py3.8 or above

@Pradnya2203
Copy link
Author

Pradnya2203 commented Mar 30, 2023

I tried the solution posted on stackoverflow but I am still facing the same error. I tried to change the python version but I am still facing the exact same error.

@samuelmaina
Copy link
Contributor

samuelmaina commented Mar 31, 2023

HI @Pradnya2203!
I have looked at your logs and you don't have Jinja 2 in your dockerfile so it won't be installed hence the error. Add it to the docker file and see if the error persists.

@Pradnya2203
Copy link
Author

Hey @samuelmaina,Thank you, I did add it in my dockerfile as well but I'm still facing the same issue.

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

Was the model developed in PY3.7? I would try a newer version if possible

@Pradnya2203
Copy link
Author

Yes it was developed in py 3.7 . I'll try that thanks

@Pradnya2203
Copy link
Author

Pradnya2203 commented Apr 2, 2023

Hey @GemmaTuron, I tried quite a few things (stackoverflow, changing version python,jinja2, flask and also tried to make some changes to the output csv file and dockerfile) but I get the same error everytime. Shall I make a pull request for it? You can check from your end as well. Also redial-2020 has 11 model types I have tried to output the results on only one of them. What else can I do?

@GemmaTuron
Copy link
Member

Hi @Pradnya2203

Thanks for your work, let's pause it here as the contribution period is coming to an end! I'll revise the work and try to identify a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants