The myoHandReorient environment appears unstable #267

jamesheald · 2024-11-05T17:36:26Z

I am training a policy using SAC on the myoHandReorient environments. Regularly throughout training, messages appear saying that the simulation is unstable:

WARNING:absl:Nan, Inf or huge value in QACC at DOF 27. The simulation is unstable. Time = 0.2480.
Simulation couldn't be stepped as intended. Issuing a reset
WARNING:absl:Nan, Inf or huge value in QACC at DOF 26. The simulation is unstable. Time = 0.4640.
Simulation couldn't be stepped as intended. Issuing a reset

When I analyse the contents of the replay buffer, I notice that observations[29:32], which correspond to the last 3 elements of obs_dict['obj_vel'] occasionally take on huge values, on the order of +/- 100K. (I also notice the muscle forces can take on values on the order of -500, though I'm not sure if this is expected or not).

The problem occurs in both myoHandReorient8-v0 and myoHandReorient100-v0. Another user has noticed that myoHandPoseRandom-v0 is unstable too #250 (comment), which seems to suggest that the problem is with the hand model.

Code to reproduce the issue:

from myosuite.utils import gym
from sbx import SAC
from stable_baselines3.common.vec_env import VecNormalize
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.vec_env import SubprocVecEnv, VecMonitor
import pickle

save_path = "my_directory/"

def make_env(env_id):
        def _init():
            env = gym.make(env_id)
            return env
        return _init

num_cpu = 20
env_id = 'myoHandReorient8-v0'
env = SubprocVecEnv([make_env(env_id) for i in range(num_cpu)], start_method='fork')
env = VecMonitor(env)
env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)

checkpoint_callback = CheckpointCallback(
                                          save_freq=max(50_000 // num_cpu, 1),
                                          save_path=save_path,
                                          save_replay_buffer=True,
                                          save_vecnormalize=True,
                                        )
model = SAC("MlpPolicy", env)
model.learn(total_timesteps=1e5, progress_bar=False, callback=checkpoint_callback)

# the buffer stores the unnormalized observations
buffer = pickle.load(open(save_path + "rl_model_replay_buffer_100000_steps.pkl", 'rb'))

# obj vel
print(buffer.observations[:,:,29:32].max())
print(buffer.observations[:,:,29:32].min())

# muscle forces
print(buffer.observations[:,:,161:].min())

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The myoHandReorient environment appears unstable #267

The myoHandReorient environment appears unstable #267

jamesheald commented Nov 5, 2024

The myoHandReorient environment appears unstable #267

The myoHandReorient environment appears unstable #267

Comments

jamesheald commented Nov 5, 2024