You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training a policy using SAC on the myoHandReorient environments. Regularly throughout training, messages appear saying that the simulation is unstable:
WARNING:absl:Nan, Inf or huge value in QACC at DOF 27. The simulation is unstable. Time = 0.2480.
Simulation couldn't be stepped as intended. Issuing a reset
WARNING:absl:Nan, Inf or huge value in QACC at DOF 26. The simulation is unstable. Time = 0.4640.
Simulation couldn't be stepped as intended. Issuing a reset
When I analyse the contents of the replay buffer, I notice that observations[29:32], which correspond to the last 3 elements of obs_dict['obj_vel'] occasionally take on huge values, on the order of +/- 100K. (I also notice the muscle forces can take on values on the order of -500, though I'm not sure if this is expected or not).
The problem occurs in both myoHandReorient8-v0 and myoHandReorient100-v0. Another user has noticed that myoHandPoseRandom-v0 is unstable too #250 (comment), which seems to suggest that the problem is with the hand model.
Code to reproduce the issue:
from myosuite.utils import gym
from sbx import SAC
from stable_baselines3.common.vec_env import VecNormalize
from stable_baselines3.common.callbacks import CheckpointCallback
from stable_baselines3.common.vec_env import SubprocVecEnv, VecMonitor
import pickle
save_path = "my_directory/"
def make_env(env_id):
def _init():
env = gym.make(env_id)
return env
return _init
num_cpu = 20
env_id = 'myoHandReorient8-v0'
env = SubprocVecEnv([make_env(env_id) for i in range(num_cpu)], start_method='fork')
env = VecMonitor(env)
env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)
checkpoint_callback = CheckpointCallback(
save_freq=max(50_000 // num_cpu, 1),
save_path=save_path,
save_replay_buffer=True,
save_vecnormalize=True,
)
model = SAC("MlpPolicy", env)
model.learn(total_timesteps=1e5, progress_bar=False, callback=checkpoint_callback)
# the buffer stores the unnormalized observations
buffer = pickle.load(open(save_path + "rl_model_replay_buffer_100000_steps.pkl", 'rb'))
# obj vel
print(buffer.observations[:,:,29:32].max())
print(buffer.observations[:,:,29:32].min())
# muscle forces
print(buffer.observations[:,:,161:].min())
The text was updated successfully, but these errors were encountered:
I am training a policy using SAC on the myoHandReorient environments. Regularly throughout training, messages appear saying that the simulation is unstable:
When I analyse the contents of the replay buffer, I notice that observations[29:32], which correspond to the last 3 elements of obs_dict['obj_vel'] occasionally take on huge values, on the order of +/- 100K. (I also notice the muscle forces can take on values on the order of -500, though I'm not sure if this is expected or not).
The problem occurs in both myoHandReorient8-v0 and myoHandReorient100-v0. Another user has noticed that myoHandPoseRandom-v0 is unstable too #250 (comment), which seems to suggest that the problem is with the hand model.
Code to reproduce the issue:
The text was updated successfully, but these errors were encountered: