Hugging Face Pipeline Models¶
Hugging Face Pipeline deployment allows you to serve almost any model from the Hugging Face Hub. It is a wrapper for Hugging Face Pipelines so you can deploy and scale almost any model from the Hugging Face Hub with a few lines of code.
Tip
To use HF Pipeline deployment, install required libraries with pip install transformers
or include extra dependencies using pip install aana[transformers]
.
HfPipelineConfig is used to configure the Hugging Face Pipeline deployment.
aana.deployments.hf_pipeline_deployment.HfPipelineConfig
¶
Attributes:
-
model_id
(str
) –The model ID on Hugging Face.
-
task
(str | None
) –The task name. If not provided, the task is inferred from the model ID. Defaults to None.
-
model_kwargs
(CustomConfig
) –The model keyword arguments. Defaults to {}.
-
pipeline_kwargs
(CustomConfig
) –The pipeline keyword arguments. Defaults to {}.
-
generation_kwargs
(CustomConfig
) –The generation keyword arguments. Defaults to {}.
Example Configurations¶
As an example, let's see how to configure the Hugging Face Pipeline deployment to serve Salesforce BLIP-2 OPT-2.7b model.
BLIP-2 OPT-2.7b
from transformers import BitsAndBytesConfig
from aana.deployments.hf_pipeline_deployment import HfPipelineConfig, HfPipelineDeployment
HfPipelineDeployment.options(
num_replicas=1,
ray_actor_options={"num_gpus": 0.25},
user_config=HfPipelineConfig(
model_id="Salesforce/blip2-opt-2.7b",
task="image-to-text",
model_kwargs={
"quantization_config": BitsAndBytesConfig(load_in_8bit=False, load_in_4bit=True),
},
).model_dump(mode="json"),
)
Model ID is the Hugging Face model ID. task
is one of the Hugging Face Pipelines tasks that the model can perform. We deploy the model with 4-bit quantization by setting quantization_config
in the model_kwargs
dictionary. You can pass extra arguments to the model in the model_kwargs
dictionary.