- Prerequisites
- Create TorchServe docker image
- Create TorchServe docker image from source
- Create torch-model-archiver from container
- Running TorchServe docker image in production
-
docker - Refer to the official docker installation guide
-
git - Refer to the official git set-up guide
-
For base Ubuntu with GPU, install following nvidia container toolkit and driver-
-
NOTE - Dockerfiles have not been tested on windows native platform.
1. If you have not clone torchserve source then:
git clone https://github.com/pytorch/serve.git
2. cd serve/docker
For creating CPU based image :
DOCKER_BUILDKIT=1 docker build --file Dockerfile -t torchserve:latest .
For creating GPU based image with the latest CUDA version PyTorch supports (ex. CUDA 10.2 as of Oct 2020):
DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .
For creating GPU based image with older CUDA versions (ex. CUDA 10.1), make sure that the --build-arg CUDA_VERSION=<version>
is specified. The version is in the format "cuda92", "cuda101":
DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --build-arg CUDA_VERSION=cu101 -t torchserve:latest .
The following examples will start the container with 8080/81 port exposed to outer-world/localhost.
For the latest version, you can use the latest
tag:
docker run --rm -it -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest
For specific versions you can pass in the specific tag to use (ex: pytorch/torchserve:0.1.1-cpu):
docker run --rm -it -p 8080:8080 -p 8081:8081 pytorch/torchserve:0.1.1-cpu
For GPU latest image with gpu devices 1 and 2:
docker run --rm -it --gpus '"device=1,2"' -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest-gpu
For specific versions you can pass in the specific tag to use (ex: 0.1.1-cuda10.1-cudnn7-runtime):
docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 pytorch/torchserve:0.1.1-cuda10.1-cudnn7-runtime
For the latest version, you can use the latest-gpu
tag:
docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 torchserve:gpu-latest
The TorchServe's inference and management APIs can be accessed on localhost over 8080 and 8081 ports respectively. Example :
curl http://localhost:8080/ping
The following are examples on how to use the build_image.sh
script to build Docker images from source to support CPU or GPU inference.
To build the TorchServe image for a CPU device using the master
branch, use the following command:
./build_image.sh
Alternatively, you can use following direct command, (assuming you have followed steps in Clone serve source)-
For cpu -
DOCKER_BUILDKIT=1 docker build --file Dockerfile.dev -t torchserve:dev .
For gpu -
DOCKER_BUILDKIT=1 docker build --file Dockerfile.dev -t torchserve:dev --build-arg MACHINE_TYPE=gpu --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 .
To create a Docker image for a specific branch, use the following command:
./build_image.sh -b <branch_name>
To create a Docker image for a specific branch and specific tag, use the following command:
./build_image.sh -b <branch_name> -t <tagname:latest>
To create a Docker image for a GPU device, use the following command:
./build_image.sh --gpu
To create a Docker image for a GPU device with a specific branch, use following command:
./build_image.sh -b <branch_name> --gpu
To create a Docker image for a GPU device with Cuda 10.1, use following command:
./build_image.sh --gpu --cudaversion cuda101
To run your TorchServe Docker image and start TorchServe inside the container with a pre-registered resnet-18
image classification model, use the following command:
./start.sh
For GPU run the following command:
./start.sh --gpu
For GPU with specific GPU device ids run the following command:
./start.sh --gpu_devices 1,2,3
Alternatively, you can use direct commands describe in Start a container with a TorchServe image above for cpu and gpu by changing image name
To create mar [model archive] file for torchserve deployment, you can use following steps
- Start container by sharing your local model-store/any directory containing custom/example mar contents as well as model-store directory (if not there, create it)
docker run --rm -it -p 8080:8080 -p 8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples torchserve:latest
- List your container or skip this if you know cotainer name
docker ps
- Bind and get the bash prompt of running container
docker exec -it <container_name> /bin/bash
You will be landing at /home/model-server/.
- Download the model weights if you have not done so already (they are not part of the repo)
curl -o /home/model-server/examples/image_classifier/densenet161-8d451a50.pth https://download.pytorch.org/models/densenet161-8d451a50.pth
- Now Execute torch-model-archiver command e.g.
torch-model-archiver --model-name densenet161 --version 1.0 --model-file /home/model-server/examples/image_classifier/densenet_161/model.py --serialized-file /home/model-server/examples/image_classifier/densenet161-8d451a50.pth --export-path /home/model-server/model-store --extra-files /home/model-server/examples/image_classifier/index_to_name.json --handler image_classifier
Refer torch-model-archiver for details.
- desnet161.mar file should be present at /home/model-server/model-store
You may want to consider the following aspects / docker options when deploying torchserve in Production with Docker.
-
Shared Memory Size
shm-size
- The shm-size parameter allows you to specify the shared memory that a container can use. It enables memory-intensive containers to run faster by giving more access to allocated memory.
-
User Limits for System Resources
--ulimit memlock=-1
: Maximum locked-in-memory address space.--ulimit stack
: Linux stack size
The current ulimit values can be viewed by executing
ulimit -a
. A more exhaustive set of options for resource constraining can be found in the Docker Documentation here, here and here -
Exposing specific ports / volumes between the host & docker env.
-p8080:8080 -p8081:8081
TorchServe uses default ports 8080 / 8081 for inference & management APIs. You may want to expose these ports to the host for HTTP Requests between Docker & Host.- The model store is passed to torchserve with the --model-store option. You may want to consider using a shared volume if you prefer pre populating models in model-store directory.
For example,
docker run --rm --shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p8080:8080 \
-p8081:8081 \
--mount type=bind,source=/path/to/model/store,target=/tmp/models <container> torchserve --model-store=/tmp/models