-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✍️ Contribution period: Pradnya #628
Comments
Motivation Statement: I first heard about Outreachy from a friend and was truly pleased by this idea of supporting diversity and encouraging the under-represented groups from all around the world. I am a sophomore at IIT Roorkee and am also a part of various student technical clubs related to software development and data science. I was quite excited to know that my application was approved and while going through the projects I came across Ersilia which seemed very appealing for multiple reasons. Firstly the cause; providing medical resources to under-developed countries. I have always wanted to be help people using my skills and would be overwhelmed to contribute for such a cause. Secondly the tech-stack used suits me and would help me in my future goals to pursue a career in data science. I have worked with various languages like python, Javascript, C++, PHP, MATLab and would like to get a strong hold on python during this internship period. Ersilia will be a great opportunity to improve my skills as well as work for the betterment of society. I am really looking forward to contribute in this project and also learn a lot in the process. |
Hi @Pradnya2203 Thanks for your interest and welcome to Ersilia! Please, if you have successfully installed Ersilia and run a test model, report it here and also let us know which system are you using. |
Hey |
Hi @Pradnya2203 ! Please read the guidelines for the contribution period. This time around in order to be able to better provide support to all applicants we have set up a set of defined tasks to be completed each week. In addition, we will be handing out specific tasks to interns as soon as we know everyone is set up |
Hi @Pradnya2203 As you will see in issue #343 this model seems to present some issues at fetch time. Thanks! |
I don't exactly know why am I getting this error "ModuleNotFoundError: No module named 'yaml' " I tried installing pyyaml but didn't change anything, I'll try to solve it though |
Hi @Pradnya2203. From your error log I also tried Fetching the Model on Ubuntu 22.04, but i had to terminate the process because it was taking too long. |
Hey @AhmedYusuff, I was not actually facing that error, was able to get around with that one but I uploaded the old log file by mistake. I have now uploaded the now log file. Thanks a lot :) |
You are welcome @Pradnya2203. In your Log file I can see your model failed when it tried to import yaml You can use |
Yes I have tried that as well @AhmedYusuff |
Hi @Pradnya2203 Important: did you activate the conda environment of the model to install yaml? you should first: |
Hey @GemmaTuron |
Hi @Pradnya2203 ! |
The model I chose for week 2 was Smiles To IUPAC Translator. This model was particularly interesting to me as it converts a simplified representation of a molecule (SMILES) into a standardized format for naming chemical compounds (IUPAC). This type of translator would be extremely useful in the field of drug discovery, where understanding the chemical structure of molecules is crucial for developing new drugs. |
I was able to fetch and serve it from the Ersilia Model Hub and get the following output
|
I than tried to actually install and run the original open source model which is https://github.com/Kohulan/Smiles-TO-iUpac-Translator#simple-usage
I edited this file to take input as "1-propoxypropane" and got the following result
I ran into certain issues, initially I couldn't figure out how to actually run it and when I did I got an error that "[Errno 0] JVM DLL not found" |
After running the model I used the given dataset to get the output. To use the dataset I first filtered out the IUPAC names of the molecules and created an array of strings and used a for loop to iterate and run the model on all the IUPAC names. I got the following output |
STOUT model has two functionalities. They are: translate_forward and translate_reverse. translate_forward converts the SMILES to IUPAC and conversely translate_reverse converts IUPAC to SMILES. In the above comment it can be seen that translate reverse has been used. Now we will use translate_forward using can_smiles from the given dataset and get the following output. |
Hi @Pradnya2203 Great, thanks for this work! Many thanks! |
The last step was to run the model on Ersilia Model Hub on the dataset. For that I fetched and served the model: "STOUT: SMILES to IUPAC name translator". Now for the model to iterate over the entire dataset which is https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv, I first processed the data and chose the can_smiles column as my input. For that I made a bash script which ran on my CLI and gave the following output.
Here s contained the whole array of strings which was can_smiles |
The two outputs of the Smiles To IUPAC Translator by using original source code and ersilia model hub gives following results posted above. On comparing the two we can see the following results:
for the ersilia model hub code We can see that the output of the two matches. Similarly we can check for other inputs as well using the files posted above |
Problems I ran into while running the model on both original source code and using ersilia model hub:
|
Hey @GemmaTuron, |
Hi @Pradnya2203 The tasks are fine, you can reach out to Masroor or Zakia who have also been working on the NCATS model. What I can suggest if you are having issues is to follow the environment.yml file manually, instead of running |
Update: I was able to create the conda environment. The mistake I did before was not setting up chemprop. But app.py is giving errors.
After searching a bit about the error I realized that it's and error with the model so I tried solving it by making sure that chemprop is running well, it took sometime as the packages were not compatible with each other and there were some errors in installing certain modules. But I was able to fix them all and made sure that chemprop is running. But am still facing the same error with app.py. I will try to fix it soon. |
Hi @Pradnya2203 ! for the local implementation, you need to make sure you download the right model and place it in the folder manually, since the models cannot be accessed from the server (they stopped maintenance apparently). Use the links provided in the development branch |
Hello @Pradnya2203 I found a fix for this. Download the model file manually from here:
and place them in their respective directory which is inside the models directory like this: then run: |
Update: I manually downloaded the model file and placed it in the right folders and also installed the right version of every single package needed and I'm still getting the same error. |
Update: I was finally able to run the ncats-adme model after a lot of struggle. I was repeatedly getting the same error which is
I tried everything, right from manually installing every package to going to the depths of the code to actually find the source of the error. Finally I realized that it was a really simple solution. There was somehow an auto-downloaded corrupt file which was the root cause of the error and just deleting it solved it. Now this might seem like a trivial issue, but I think it causes huge inconvenience as the file is auto-downloaded and the error barely tells anything about it and we keep getting UnpicklingError . |
Week 3: Model Proposal oneModel Name:ADMET_XGBoost Model Description:The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. The model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, the model is ranked first in 18 tasks and top 3 in 21 tasks. Task:Accurate ADMET prediction Package Dependencies:python=3.7 Publication:https://paperswithcode.com/paper/accurate-admet-prediction-with-xgboost Supplementary Informationhttps://arxiv.org/pdf/2204.07532v3.pdf Source Code:https://github.com/smu-tao-group/ADMET_XGBoost LicenseGNU General Public License v3.0 |
Week 3: Model Proposal twoModel Name:Model Description:Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. AI-Bind is a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. AI-Bind predicted drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. These predictions are also validated via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery. Package Dependencies:Publication:https://paperswithcode.com/paper/ai-bind-improving-binding-predictions-for Supplementary Information:https://arxiv.org/pdf/2112.13168v5.pdf Source Code:https://github.com/chatterjeeayan/ai-bind Data files:https://zenodo.org/record/7226641 License:MIT License |
Week 3: Model Proposal ThreeModel Name:OpenChem Model Description:OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers. Main Features:Modular design with unified API, modules can be easily combined with each other. Package Dependencies:numpy Tasks:Classification (binary or multi-class) Publication:https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00971 Supplementary Information:https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c00971 Source Code:https://github.com/Mariewelt/OpenChem License:MIT License |
Hi @Pradnya2203 ! Similar to OpenMM that @samuelmaina has pointed to, OpenChem is a framework to develop models, but not a model in itself, so we could not directly incorporate it in the Hub, we should use it to train models and then incorporate those in the Hub - I don't like the fact that Nvidia GPU's are required to run OpenChem, since most computers do not have them. |
Hi @Pradnya2203 ! Sorry, I missed the above: |
As next steps,
|
Hey @GemmaTuron, I tried to run REDIAL 2000, it was fairly easy to run and I used their own sample dataset and got the following results: REDIAL-2020 is an open-source, open-access machine learning suite for estimating anti-SARS-CoV-2 activities from molecular structure. By leveraging data available from NCATS, eleven categorical machine learning models are developed: CPE, cytotox, AlphaLISA, TruHit, ACE2, 3CL, CoV-PPE, CoV-PPE_cs, MERS-PPE, MERS-PPE_cs and hCYTOX. These models are exposed on the REDIAL-2020 portal, and the output of a similarity search using input data as a query is provided for every submitted molecule. The top-ten most similar molecules to the query molecule from the existing COVID-19 databases, together with associated experimental data, are displayed. This allows users to evaluate the confidence of the machine learning predictions. |
With the ADMET-XGBoost, I tried running it as well, it does have the dataset available but I fail to see any checkpoints. I tried finding it on their documentation as well but was unable to. |
Is REDIAL running on a webserver or you have access to the model checkpoints? If the latter, we could try to incorporate it in the hub! |
REDIAL is not running on a webserver, it is hosted on a website though which is http://drugcentral.org/Redial. |
Hi @Pradnya2203 That's great, did you run it through the webserver or did you install the model? could you try with downloading the checkpoints and running predictions if you didn't? |
I installed the model and then ran the predictions using the sample data file available on their repository and got the results posted above. The checkpoints(.pkl files) were installed along with the model. I think we can try to incorporate this in the hub. |
cool, feel free to go ahead and open a model request issue! |
@GemmaTuron, I did open a model request issue, will read the documents now. Thank you |
I did find some other model suggestions as well. Model Name:ATC_CNN Model Description:Anatomical Therapeutic Chemical (ATC) classification for compounds/drugs plays an important role in drug development and basic research. However, previous methods depend on interactions extracted from STITCH dataset which may make it depend on lab experiments. ATC_CNN presents a pilot study to explore the possibility of conducting the ATC prediction solely based on the molecular structures. The motivation is to eliminate the reliance on the costly lab experiments so that the characteristics of a drug can be pre-assessed for better decision-making and effort-saving before the actual development Package Dependencies:torch Slug:ATC-CNN Publication:https://academic.oup.com/bib/article/23/5/bbac346/6677124 Supplementary Information:Source Code:https://github.com/lookwei/ATC_CNN License:None |
Model Name:Reinvent Model Description:The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. Reinvent aims to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space. Package Dependencies:Slug:reinvent Publications:https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00915# Source Code:https://github.com/MolecularAI/Reinvent License:Apache License 2.0 |
Hi @Pradnya2203 ! Thanks, can you add the ATC-CNN model to our list? |
@GemmaTuron, thanks I will now focus on model incorporation |
Hey @GemmaTuron
This is my main.py file (I think this needs some change) and this is the error
This is till where the model is running |
I also added ATC-CNN model to the suggestions list. |
I was able to solve that error and run the model using main.py. There was an issue with the conda environment. Now I am trying to fetch it.
I tried to install jinja2, change the version of flask and jinja2 both but am still facing this error. This the entire log file |
Hi @Pradnya2203 ! Seems that there is a versioning issue: https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2 |
I tried the solution posted on stackoverflow but I am still facing the same error. I tried to change the python version but I am still facing the exact same error. |
HI @Pradnya2203! |
Hey @samuelmaina,Thank you, I did add it in my dockerfile as well but I'm still facing the same issue. |
Hi @Pradnya2203 Was the model developed in PY3.7? I would try a newer version if possible |
Yes it was developed in py 3.7 . I'll try that thanks |
Hey @GemmaTuron, I tried quite a few things (stackoverflow, changing version python,jinja2, flask and also tried to make some changes to the output csv file and dockerfile) but I get the same error everytime. Shall I make a pull request for it? You can check from your end as well. Also redial-2020 has 11 model types I have tried to output the results on only one of them. What else can I do? |
Hi @Pradnya2203 Thanks for your work, let's pause it here as the contribution period is coming to an end! I'll revise the work and try to identify a solution |
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application
The text was updated successfully, but these errors were encountered: