Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The myoHandReorient environment appears unstable #267

Open
jamesheald opened this issue Nov 5, 2024 · 0 comments
Open

The myoHandReorient environment appears unstable #267

jamesheald opened this issue Nov 5, 2024 · 0 comments

Comments

@jamesheald
Copy link
Contributor

I am training a policy using SAC on the myoHandReorient environments. Regularly throughout training, messages appear saying that the simulation is unstable:

WARNING:absl:Nan, Inf or huge value in QACC at DOF 27. The simulation is unstable. Time = 0.2480.
Simulation couldn't be stepped as intended. Issuing a reset
WARNING:absl:Nan, Inf or huge value in QACC at DOF 26. The simulation is unstable. Time = 0.4640.
Simulation couldn't be stepped as intended. Issuing a reset

When I analyse the contents of the replay buffer, I notice that observations[29:32], which correspond to the last 3 elements of obs_dict['obj_vel'] occasionally take on huge values, on the order of +/- 100K. (I also notice the muscle forces can take on values on the order of -500, though I'm not sure if this is expected or not).

The problem occurs in both myoHandReorient8-v0 and myoHandReorient100-v0. Another user has noticed that myoHandPoseRandom-v0 is unstable too #250 (comment), which seems to suggest that the problem is with the hand model.

Code to reproduce the issue:

from myosuite.utils import gym
from sbx import SAC
from stable_baselines3.common.vec_env import VecNormalize
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.vec_env import SubprocVecEnv, VecMonitor
import pickle

save_path = "my_directory/"

def make_env(env_id):
        def _init():
            env = gym.make(env_id)
            return env
        return _init

num_cpu = 20
env_id = 'myoHandReorient8-v0'
env = SubprocVecEnv([make_env(env_id) for i in range(num_cpu)], start_method='fork')
env = VecMonitor(env)
env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)

checkpoint_callback = CheckpointCallback(
                                          save_freq=max(50_000 // num_cpu, 1),
                                          save_path=save_path,
                                          save_replay_buffer=True,
                                          save_vecnormalize=True,
                                        )
model = SAC("MlpPolicy", env)
model.learn(total_timesteps=1e5, progress_bar=False, callback=checkpoint_callback)

# the buffer stores the unnormalized observations
buffer = pickle.load(open(save_path + "rl_model_replay_buffer_100000_steps.pkl", 'rb'))

# obj vel
print(buffer.observations[:,:,29:32].max())
print(buffer.observations[:,:,29:32].min())

# muscle forces
print(buffer.observations[:,:,161:].min())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant