Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: eos3ae7 repeatedly fails to fetch #343

Closed
Cee-tech21 opened this issue Oct 15, 2022 · 35 comments
Closed

🐛 Bug: eos3ae7 repeatedly fails to fetch #343

Cee-tech21 opened this issue Oct 15, 2022 · 35 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Cee-tech21
Copy link

Describe the bug.

fetching of eos3ae7 repeatedly fails with the following error message logged:

"Model API eos3ae7:predict did not produce an output"

Describe the steps to reproduce the behavior

Run the following command:
ersilia -v fetch eos3ae7 | tee -a eos3ae7_fetch.log 2>&1

Expected behavior.

After running the "fetch" command, the model eos3ae7 is meant to be downloaded from remote repository to local computer.

Screenshots.

eos3ae7_fetch.log

Operating environment

Linux Mint 19

Additional context

No response

@Cee-tech21 Cee-tech21 added the bug Something isn't working label Oct 15, 2022
@Zainab-ik
Copy link
Collaborator

@Cee-tech21 try it again while the internet is connected. Sometimes, could be due to internet break.

@GemmaTuron
Copy link
Member

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the |tee -a
I have seen it in #355 and #344

@Jona-Bvunza
Copy link

Jona-Bvunza commented Oct 17, 2022

Hi @Cee-tech21 !
Was the error corrected ?, I have similar issues with my models

@GemmaTuron
Copy link
Member

Hi @Cee-tech21 ! Was the error corrected ?, I have similar issues with my models

Hello @Jona-Bvunza , please check the Slack channel for a more in depth explanation, the fact that you get an "empty output error" might come from a very different issue, so please open your own issue and paste the log file

@GemmaTuron
Copy link
Member

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the |tee -a I have seen it in #355 and #344

I am seeing that this model is also using the sqlalchemy package, maybe linked to what we are seeing in #338.
@Cee-tech21 can you do the same test as @Femme-js? (check the version in the conda environment of the model, and try to run the model in colab)

@miquelduranfrigola do you think the problem might be in the sqlalchemy versions?

@Cee-tech21
Copy link
Author

Model has now been run on google colab but the same error noted in this issue is witnessed in google colab.
chizi_e_cee-tech

@Cee-tech21
Copy link
Author

Since this model fails to fetch both on my local computer and on colab, I intend closing this issue with the presumption/conclusion that there's a problem preventing the model from being fetched.

@GemmaTuron
Copy link
Member

@Cee-tech21
I can reproduce the same error, I need some time to check what can be the issue.
Can you please leave the issue open but change title to "eos3ae7 fails at fetching time"
I will add some tags to help us locate it.
Mark the issue on excel and move on!

@GemmaTuron GemmaTuron added help wanted Extra attention is needed model-bug labels Oct 18, 2022
@Cee-tech21 Cee-tech21 changed the title 🐛 Bug: fetching of eos3ae7 repeatedly fails 🐛 Bug: eos3ae7 repeatedly fails to fetch Oct 18, 2022
@miquelduranfrigola
Copy link
Member

@GemmaTuron what is the current status of this?

@Cee-tech21
Copy link
Author

Cee-tech21 commented Nov 21, 2022

Hi all!
Fetching this model here. Fetch still fails.
Will update this post once fetch is successful.

@GemmaTuron
Copy link
Member

@GemmaTuron what is the current status of this?

Hello @miquelduranfrigola

The issues tagged with "help wanted" and "model-bug" are models that consistenly encountered problems at fetch time. We will work with the Outreachy interns during the internship period in making sure they run consistently.

@Cee-tech21 let us know if you are trying again, thanks

@Cee-tech21
Copy link
Author

Cee-tech21 commented Nov 21, 2022

Hi @GemmaTuron, I have just tried to fetch model "eos3ae7" again. Fetching of eos3ae7 still fails.

@miquelduranfrigola
Copy link
Member

Thanks @Cee-tech21 - we are compiling a list of problematic models and we will address them in one batch before Christmas. Will keep you posted.

@GemmaTuron GemmaTuron moved this to Suggested in Ersilia Model Hub Dec 4, 2022
@paulinebanye paulinebanye self-assigned this Dec 6, 2022
@paulinebanye paulinebanye moved this from Suggested to Accepted in Ersilia Model Hub Dec 6, 2022
@paulinebanye paulinebanye moved this from Accepted to In Progress in Ersilia Model Hub Dec 6, 2022
@paulinebanye
Copy link
Contributor

paulinebanye commented Dec 7, 2022

Hi @GemmaTuron @miquelduranfrigola

This model fails to fetch using the CLI and colab. It returns an EmptyOutputError

System

Windows 10

Conda version

conda 22.9.0

Pip version

pip 22.3.1

Python version

Python 3.7.13

SQLAlchemy version

Version: 1.3.24

Steps to reproduce the behavior

ersilia -v fetch eos3ae7 > eos3ae7.log 2>&1

error on CLI

error log - eos3ae7.log

error on colab

###Attempts to resolve the error
Based on similar errors,

  • Reinstalled git LFS
  • Reinstalled git CLI
  • Updated pip
  • Currently updating conda

@paulinebanye
Copy link
Contributor

paulinebanye commented Dec 8, 2022

Just a quick update regarding the status of this model.
I continued working on #343 started on #369 as they both return the same EmptyOutputError but I came across issues with the dependencies.

  • Cleared the tmp file.
  • Updated Ubuntu
  • Reinstalled & updated conda
  • Cloned the isaura repo again
  • Cloned the Ersilia repo again
  • Reinstalled dependencies but encountered errors with the version of sqlalchemy and bentoml. However updating or downgrading the versions leads to dependency conflicts with other installs i.e. isaura

Sqlalchemy
sqlalchemy error

Bentoml
bentoml error

@GemmaTuron
Copy link
Member

Hi @pauline-banye
If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine?
Thanks!

@paulinebanye
Copy link
Contributor

Hi @pauline-banye If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine? Thanks!

Hi @GemmaTuron I am on a WSL machine. I have tested 3 of the models with issues eos3ae7, eos4tccc and eos1579
eos3ae7.log
eos4tcc.log

I'm in the process of testing the models on colab as well.
So far I have tested eos4tcc on colab and it returns an EmptyOutputError as well.

I would update you once I have tested the other models on colab

@paulinebanye
Copy link
Contributor

Update @GemmaTuron @miquelduranfrigola. I was able to resolve the issue with my system not fetching any model.

Steps I took were:

  • Uninstalled and reinstalled ubuntu 20.04
  • Installed pip 22.3.1
  • Downgraded to python 3.7.15
  • Installed anaconda 4.14.
  • Cloned the Ersilia repo again
  • Did not clone Isaura

I fetched the model multiple times and encountered errors relating to dependencies on different ocassions "no module named pandas", "no module named keras", "no module named tensorflow". Which was resolved by running:

  • "pip install pandas"
  • "pip install keras"
  • "pip install tensorflow"

The current error returned is ModuleNotFoundError: No module named 'keras.layers.recurrent' which I tried to resolve with pip install keras.layers.recurrent.

keras

eos3ae7.log

@miquelduranfrigola
Copy link
Member

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

@paulinebanye
Copy link
Contributor

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

Thank you @miquelduranfrigola 😊. It would be updating the reports on the other two models I tested as well.

@GemmaTuron
Copy link
Member

Hi,

Hoping to bring some extra information on this issue. I have installed WSL in my windows machine to make sure I can reproduce @pauline-banye settings. I have taken special care to ensure that the python path is set to the Anaconda python, so conda environments should be directed to the right place. Just to be clear, there is no Python installed outside Conda in the WSL system -- this could be a source problem, though it shouldn't

When I run $ echo -e ${PATH//:/\\n} the first lines are:
/home/gturon/anaconda3/condabin
/home/gturon/.vscode-server/bin/5235c6bb189b60b01b1f49062f4ffa42384f8c91/bin/remote-cli
/usr/local/sbin
/usr/local/bin

When fetching the model eos3ae7, I get the following error:

Detailed error:
Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
File "/home/gturon/eos/repository/eos3ae7/20221212224955_5D39E0/eos3ae7/artifacts/framework/code/main.py", line 7, in
import pandas as pd
ModuleNotFoundError: No module named 'pandas'

So, pandas is not found, but when I do:
$ conda activate eos3ae7
$ conda list
I find pandas installed (version 1.3.5. )
The package is imported without problems, so it IS in the environment. For eos4tcc, is basically the same but the module not found is joblib (which again, IS in the conda environment, version 1.1.0)
This is suspiciously similar to the issue we were encountering in Google Colab when the pythonpath was not properly set, as @carcablop identified.

@GemmaTuron GemmaTuron moved this from In Progress to Stuck in Ersilia Model Hub Dec 13, 2022
@Cee-tech21
Copy link
Author

Cee-tech21 commented Dec 15, 2022

Hello everyone! Great job!!!
Quick update here!!

I tried again to fetch model eos3ae7 using google colab but I'm getting the error message below after the fetch code executes for around 10 minutes:

Detailed error:
Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
File "/root/eos/repository/eos3ae7/20221215160804_62CE4B/eos3ae7/artifacts/framework/code/main.py", line 7, in
import pandas as pd
ModuleNotFoundError: No module named 'pandas'

pandas related error message should not be showing as pandas was successfully imported and successfully called before issuing the fetch command. Have a look at colab link...

https://colab.research.google.com/drive/1I4pmrDjXS_XXwRRWyTSI-Kf5m76SXPR9?usp=sharing

@GemmaTuron
Copy link
Member

I've been checking if the latest updates on the pythonpaths 70bcf54 would solve this issue but it seems we still lack some packages, in this latest test (in colab): "yaml"

@GemmaTuron
Copy link
Member

And the latest updates we did to the pythonpaths seem to be breaking the code somewhere else on the CLI (see log file attached)
eos3ae7.txt

@samuelmaina
Copy link
Contributor

Run the model in WSL2 (using Ubuntu 20.04.5) and I get the same error of package not found but in this case it is "yaml". I have confirmed 'yaml' is not installed in the eos3ae7 env but pandas is . Tried to to install it manually but the model didn't work.

Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/main.py", line 10, in <module>
    from chemvae.vae_utils import VAEUtils
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 4, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

More Error logs can be found at
eos3ae7_fetch.log

@GemmaTuron
Copy link
Member

Hi @samuelmaina

If you clone the repository to your local system, and modify their installation requirements to add the yaml package, does it work?
You then need to call the model using the --repo_path <path_to_cloned_repo> flag at the end of the fetch command

@samuelmaina
Copy link
Contributor

samuelmaina commented Mar 29, 2023

@GemmaTuron Pandas is not detected in the remote repo.
Added pandas and pyyaml(also tried with PyYALM) to the Dockerfile so that they are installed.
dockerfile_change
I got pandas not installed error.
pandas_error

Pandas was not in the eos3ae7 env but ruamel-yaml was.
Looked at script.sh generated to run the installation command from the line Running bash /tmp/ersilia-1k_bwc4b/script.sh > /tmp/ersilia-_wtlkhjr/command_outputs.log 2>&1. After running all the installation commands the script was downloading code from https://github.com/ersilia-os/bentoml-ersilia. I looked at the setup.py setup.py and found that Yaml that is in "required include" is the "ruamel.yaml" which is incompatible with import yaml, it is used as

    from ruamel.yaml import YAML

    yaml=YAML(typ='safe')   # default, if not specfied, is 'rt' (round-trip)
    yaml.load(doc)

as seen from here. The required yaml is pyyaml .My guess is that there is some automated workflows that are uninstalling pandas .

@samuelmaina
Copy link
Contributor

I have tested the model with one conda-forge(I had two in the dockerfile in the previous comment) and the results are the same.

@GemmaTuron
Copy link
Member

Hi @samuelmaina !

Thanks, that is a very good catch! I'll need to see why are we using ruamel.yaml in bentoml --- maybe it will be easier to change the pyyaml to ruamel.yaml in the model itself, since the bento-ml package is used by all ersilia models ? What do you think?
I need some time to think about it, but your work has been great to point us in the right direction, many thanks

@samuelmaina
Copy link
Contributor

I am really grateful,. I think its a good idea to install pyyalm for the local model, no need to break the others. Migrating to pyyaml would be hectic but you can consult.

@samuelmaina
Copy link
Contributor

Hi everyone!
@GemmaTuron
I tried authors' recommended versions .I tried tensorflow=1.10.0, keras ('Keras>=2.0.0,<=2.0.7') together with pyyaml and pandas. tensorflow=1.10.0 couldn't be found but the second command in git_hub_issue installed it . The 1.1.0.0 version was having a numpy version range values below other modules numpys resulting in an installation error. I tried version 1.15.0 but got this error

Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos3ae7:predict did not produce an outputUsing TensorFlow backend.
From /home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:439: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

From /home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3540: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

Traceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/main.py", line 16, in <module>
    vae = VAEUtils()
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 43, in __init__
    self.enc = load_encoder(self.params)
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/chemvae/models.py", line 79, in load_encoder
    return load_model(params['encoder_weights_file'])
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/models.py", line 239, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/models.py", line 313, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/layers/__init__.py", line 54, in deserialize
    printable_module_name='layer')
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 2497, in from_config
    process_node(layer, node_data)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 2454, in process_node
    layer(input_tensors[0], **kwargs)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 575, in __call__
    self.build(input_shapes[0])
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/layers/convolutional.py", line 134, in build
    constraint=self.kernel_constraint)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 399, in add_weight
    constraint=constraint)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 323, in variable
    v.constraint = constraint
AttributeError: can't set attribute

Pandas and yaml were imported and used correctly since they are imported before VAEUtils() is called.

. After some research, the error was emerging from using 'Keras 2.0.7" . I then upgraded to the latest version 2.12.0 but it requires python 3.8. I used conda install keras to install the best version. but I got this error.

12:14:45 | DEBUG    | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}]
12:14:56 | ERROR    | Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/main.py", line 10, in <module>
    from chemvae.vae_utils import VAEUtils
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 5, in <module>
    from .models import load_encoder, load_decoder, load_property_predictor
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/chemvae/models.py", line 1, in <module>
    from keras.layers import Input, Lambda
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/__init__.py", line 20, in <module>
    from keras import distribute
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/distribute/__init__.py", line 18, in <module>
    from keras.distribute import sidecar_evaluator
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/distribute/sidecar_evaluator.py", line 22, in <module>
    from keras.optimizers.optimizer_experimental import (
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/optimizers/__init__.py", line 25, in <module>
    from keras import backend
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend.py", line 32, in <module>
    from keras import backend_config
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend_config.py", line 33, in <module>
    @tf.__internal__.dispatch.add_dispatch_support
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute '__internal__'

Researched a bit and found a solution at stackoverflow. After setting keras=2.1.6,I got a lot of inner dependencies conflict errors as can been seen from keras_2_1_6_error.txt and both pandas and yaml were not installed due to conflicts.
I looked at the original repo and users are requesting for the exact dependencies as can be seen from chemical_vae_issue. Someone can come up with with the working versions for the model but I think it will take a lot of time.
I hope this research will shed some more light.
Py YAML is used at the higher level of this model. bentoml-ersilia setup.py is fine.

@miquelduranfrigola
Copy link
Member

Hi @GemmaTuron and @samuelmaina - this is here on hold. What is the current status?

@samuelmaina
Copy link
Contributor

@miquelduranfrigola last time I tested it was not working due to dependencies issues.If @GemmaTuron isn't done with it , I can try to resolve the issue again.

@GemmaTuron
Copy link
Member

Hi @miquelduranfrigola and @samuelmaina
Let's focus on this model once we get to it? (family of generative models)

@GemmaTuron
Copy link
Member

This has been solved now! check the repo on this model for more :)

@GemmaTuron GemmaTuron moved this from Stuck to Done in Ersilia Model Hub Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants