Skip to content

Hugging Face Pipeline Models

Hugging Face Pipeline deployment allows you to serve almost any model from the Hugging Face Hub. It is a wrapper for Hugging Face Pipelines so you can deploy and scale almost any model from the Hugging Face Hub with a few lines of code.

Tip

To use HF Pipeline deployment, install required libraries with pip install transformers or include extra dependencies using pip install aana[transformers].

HfPipelineConfig is used to configure the Hugging Face Pipeline deployment.

aana.deployments.hf_pipeline_deployment.HfPipelineConfig

Attributes:

  • model_id (str) –

    The model ID on Hugging Face.

  • task (str | None) –

    The task name. If not provided, the task is inferred from the model ID. Defaults to None.

  • model_kwargs (CustomConfig) –

    The model keyword arguments. Defaults to {}.

  • pipeline_kwargs (CustomConfig) –

    The pipeline keyword arguments. Defaults to {}.

  • generation_kwargs (CustomConfig) –

    The generation keyword arguments. Defaults to {}.

Example Configurations

As an example, let's see how to configure the Hugging Face Pipeline deployment to serve Salesforce BLIP-2 OPT-2.7b model.

BLIP-2 OPT-2.7b

from transformers import BitsAndBytesConfig
from aana.deployments.hf_pipeline_deployment import HfPipelineConfig, HfPipelineDeployment

HfPipelineDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25},
    user_config=HfPipelineConfig(
        model_id="Salesforce/blip2-opt-2.7b",
        task="image-to-text",
        model_kwargs={
            "quantization_config": BitsAndBytesConfig(load_in_8bit=False, load_in_4bit=True),
        },
    ).model_dump(mode="json"),
)

Model ID is the Hugging Face model ID. task is one of the Hugging Face Pipelines tasks that the model can perform. We deploy the model with 4-bit quantization by setting quantization_config in the model_kwargs dictionary. You can pass extra arguments to the model in the model_kwargs dictionary.