Hugging Face Pipeline Models¶

Hugging Face Pipeline deployment allows you to serve almost any model from the Hugging Face Hub. It is a wrapper for Hugging Face Pipelines so you can deploy and scale almost any model from the Hugging Face Hub with a few lines of code.

Tip

To use HF Pipeline deployment, install required libraries with pip install transformers or include extra dependencies using pip install aana[transformers].

HfPipelineConfig is used to configure the Hugging Face Pipeline deployment.

aana.deployments.hf_pipeline_deployment.HfPipelineConfig ¶

Attributes:

model_id (str) –

The model ID on Hugging Face.
task (str | None) –

The task name. If not provided, the task is inferred from the model ID. Defaults to None.
model_kwargs (CustomConfig) –

The model keyword arguments. Defaults to {}.
pipeline_kwargs (CustomConfig) –

The pipeline keyword arguments. Defaults to {}.
generation_kwargs (CustomConfig) –

The generation keyword arguments. Defaults to {}.

Example Configurations¶

As an example, let's see how to configure the Hugging Face Pipeline deployment to serve Salesforce BLIP-2 OPT-2.7b model.

BLIP-2 OPT-2.7b

from transformers import BitsAndBytesConfig
from aana.deployments.hf_pipeline_deployment import HfPipelineConfig, HfPipelineDeployment

HfPipelineDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25},
    user_config=HfPipelineConfig(
        model_id="Salesforce/blip2-opt-2.7b",
        task="image-to-text",
        model_kwargs={
            "quantization_config": BitsAndBytesConfig(load_in_8bit=False, load_in_4bit=True),
        },
    ).model_dump(mode="json"),
)

Model ID is the Hugging Face model ID. task is one of the Hugging Face Pipelines tasks that the model can perform. We deploy the model with 4-bit quantization by setting quantization_config in the model_kwargs dictionary. You can pass extra arguments to the model in the model_kwargs dictionary.