Language Wrapper for the Nethack Learning Environment (NLE) and MiniHack
This wrapper inherits from the Gym Wrapper Class and translates the non-language observations from NLE tasks into similar language representations. Actions can also be optionally provided in text form which are converted to the Discrete actions of the NLE.
Inventory:
a: a blessed +1 mace (weapon in hand)
b: a +0 robe (being worn)
c: a blessed +0 small shield (being worn)
d: 4 potions of holy water
e: a clove of garlic
f: a sprig of wolfsbane
g: a spellbook of stone to flesh
h: a spellbook of identify
Stats:
Strength:15/15
Dexterity:10
Constitution:12
Intelligence:12
Wisdom:18
Charisma:9
Depth:1
Gold:0
HP:14/14
Energy:6/6
AC:7
XP:1/0
Time:1
Position:46|14
Hunger:Not Hungry
Monster Level:0
Encumbrance:Unencumbered
Dungeon Number:0
Level Number:1
Score:0
Alignment:Neutral
Condition:None
Cursor:Yourself a priestess
Observation:
vertical closed door far westnorthwest
horizontal wall near north and northwest
vertical wall very near northeast and east
vertical closed door very near eastnortheast
southeast corner very near southeast
horizontal wall very near south and southwest
tame kitten adjacent northeast
Message:
Hello Agent, welcome to NetHack! You are a neutral human Priestess.
The environment converts the NLE observations: glyphs
, blstats
, tty_chars
, inv_letters
, inv_strs
and tty_cursor
to text equivalents.
text_glyphs
: A compressed textual representation of the surroundings.
dark area far west
vertical wall near east and southeast
horizontal wall near south and southwest
horizontal closed door near southsouthwest
black onyx ring near westsouthwest
doorway near west
egg very near east
horizontal wall adjacent north, northeast, and northwest
tame little dog adjacent southwest
Corresponding to the following visual display
---------
.....@.%|
|...d...|
|.......|
|=......|
----+----
text_message
: Current message. Same asmessage
from NLE however also includes menus when present.
Aloha Agent, welcome to NetHack! You are a neutral female human Tourist.
text_blstats
: Text version of the bottom-line stats and auxiliary stats include with NLE.
Strength:11/11
Dexterity:12
Constitution:14
Intelligence:16
Wisdom:9
Charisma:14
Depth:1
Gold:241
HP:10/10
Energy:2/2
AC:10
XP:1/0
Time:1
Position:48|2
Hunger:Not Hungry
Monster Level:0
Encumbrance:Unemcumbered
Dungeon Number:0
Level Number:1
Score:0
Alignment:Neutral
Condition:None
text_inventory
: Current inventory with letters.
$: 241 gold pieces
a: 22 +2 darts (at the ready)
b: 6 uncursed food rations
c: 3 uncursed tripe rations
d: an uncursed egg
e: 2 uncursed fortune cookies
f: 2 uncursed potions of extra healing
g: 2 uncursed scrolls of magic mapping
h: 2 blessed scrolls of magic mapping
i: an uncursed +0 Hawaiian shirt (being worn)
j: an expensive camera (0:68)
k: an uncursed credit card
text_cursor
: Description of glyph currently under cursor.
Yourself a tourist
Actions are by default text actions like wait
, apply
, north
ect. The corresponding key-presses are supported as well, e.g. west
is the same as h
and kick
is the same as ^d
. Alternatively the standard discrete action space from NLE can be used by passing use_language_action=False
to the wrapper.
ssh [email protected]
to immediately try out the wrapper using a MiniHack or NethackChallenge Task using the included play.py.
The wrapper has been tested on macOS 12.5, Ubuntu 20.04 natively, and on Windows using WSL.
Note: The agent component uses sample factory which does not support Windows WSL and requires a PyTorch supported GPU.
Requires python>=3.7
and cmake>=3.15
.
To install
CMake can be installed on macos using homebrew
brew install cmake
Alternatively, and for other platforms follow the instructions at https://cmake.org/install/
On Ubuntu you may also require additional dependencies, follow the steps at https://github.com/facebookresearch/nle#installation.
You can install the package by cloning the repository and then installing it using pip
git clone --recursive https://github.com/Pervasive-AI-Lab/nle-language-wrapper.git
cd nle-language-wrapper
pip install .
The wrapper can be installed in Google Colab after installing the following dependencies
!sudo apt-get install -y build-essential autoconf libtool pkg-config \
python3-dev python3-pip python3-numpy git libncurses5-dev \
libzmq3-dev flex bison
!git clone https://github.com/google/flatbuffers.git
!cd flatbuffers && cmake -G "Unix Makefiles" && make -j2 && sudo make install
!pip install cmake==3.15.3
For an example Google Colab notebook, see NLE-Language-Wrapper-Example.ipynb
For development on the wrapper clone the repository and install it in development mode.
git clone https://github.com/ngoodger/nle-language-wrapper --recursive
pip install -e ".[dev]"
To update the library with changes to the C++ code recompile by running
python -m setup develop
The included Makefile defines useful targets for development.
To run the test suite
make test
Format the code using black
, isort
, and clang-format
make format-python
make format-cpp
Check the code formatting with black
, isort
and clang-format
, and pylint
make format-python-check
make format-cpp-check
The wrapper can be used simply by instantiating a base environment from NLE or MiniHack and passing it to the wrapper constructor.
import gym
import nle
from nle_language_wrapper import NLELanguageWrapper
env = NLELanguageWrapper(gym.make("NetHackChallenge-v0"))
obsv = env.reset()
obsv, reward, done, info = env.step("wait")
Alternatively to utilize the discrete actions rather than language actions specify use-text-action=False
.
env = NLELanguageWrapper(gym.make("NetHackChallenge-v0"), use_language_action=False)
obsv = env.reset()
wait_action = 17
obsv, reward, done, info = env.step(wait_action)
A script is provided select an NLE or MiniHack task and directly interact with an environment.
python -m nle_language_wrapper.scripts.play
An included Sample Factory based agent achieves 730 reward after 700M frames. This agent uses a small transformer model to encode the language observations for the policy and value function models. The algorithm used is Asynchronous Proximal Policy Optimization (APPO) described in Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning.
The default configuration was tested on an Nvidia 3090 with 24Gbyte RAM and a Ryzen 1700 CPU. Training runs at approximately 4k/FPS. To train on a GPU with less RAM a smaller model could be configured, or a smaller max token length, or batch size could be used. These parameters can be passed when running the training script nle_language_wrapper/agents/sample_factory/train.py
, e.g.
--transformer_hidden_size 64
--transformer_hidden_layers 2
--transformer_attention_heads 2
--max_token_length 256
--batch_size 1024
The pre-trained agent checkpoints are included in the train_dir
. Clone the repository and run the following script to test it.
python nle_language_wrapper/agents/sample_factory/enjoy.py \
--env nle_language_env \
--encoder_custom nle_language_transformer_encoder \
--experiment nle_language_agent \
--algo APPO \
--fps 1
To train a new agent simply run the following script and the set the experiment name to the desired value.
python nle_language_wrapper/agents/sample_factory/train.py \
--env nle_language_env \
--encoder_custom nle_language_transformer_encoder \
--experiment nle_language_agent_1 \
--algo APPO \
--batch_size 2048 \
--num_envs_per_worker 24 \
--num_workers 8 \
--reward_scale 0.1 \
--train_for_env_steps=1000000000
MIT License