Max width of ensemble fan out #7256
Unanswered
zachary-mcpher
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am working on a project to serve an encoder based model in a Triton Inference Server ensemble. The nodes will be a preprocessing node which feeds directly into an encoder (generate an embedding feature from roberta-base) and then a fan out to K light weight classification head ( think N linear layers).
How far can I reasonably push K? Would the ensemble orchestrator be capable of handling inference at K=100 classifiers?
Beta Was this translation helpful? Give feedback.
All reactions