Skip to content

Image-to-Text Models

Aana SDK has two deployments to serve image-to-text models:

  • Idefics2Deployment: used to deploy the Idefics2 models. Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs.

  • HFBlip2Deployment: used to deploy the BLIP-2 models. HFBlip2Deployment only supports image captioning capabilities of the BLIP-2 model.

Tip

To use Idefics2 or HF BLIP2 deployments, install required libraries with pip install transformers or include extra dependencies using pip install aana[transformers].

Idefics2 Deployment

Idefics2Config is used to configure the Idefics2 deployment.

aana.deployments.idefics_2_deployment.Idefics2Config

Attributes:

  • model_id (str) –

    The model ID on HuggingFace.

  • dtype (Dtype) –

    The data type. Defaults to Dtype.AUTO.

  • enable_flash_attention_2 (bool | None) –

    Use Flash Attention 2. If None, Flash Attention 2 wii be enabled if available. Defaults to None.

  • model_kwargs (CustomConfig) –

    The extra model keyword arguments. Defaults to {}.

  • processor_kwargs (CustomConfig) –

    The extra processor keyword arguments. Defaults to {}.

  • default_sampling_params (SamplingParams) –

    The default sampling parameters. Defaults to SamplingParams(temperature=1.0, max_tokens=256).

Example Configurations

As an example, let's see how to configure the Idefics2 deployment for the Hugging Face Idefics2 8B model.

Hugging Face Idefics2 8B

from aana.core.models.types import Dtype
from aana.deployments.idefics_2_deployment import Idefics2Config, Idefics2Deployment

Idefics2Deployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.85},
    user_config=Idefics2Config(
        model_id="HuggingFaceM4/idefics2-8b",
        dtype=Dtype.FLOAT16,
    ).model_dump(mode="json"),
)

Model is the Hugging Face model ID. dtype=Dtype.FLOAT16 is used to specify the data type to be used for the model. Idefics2 supports Dtype.BFLOAT16 and it is generally faster but not supported by all GPUs. You can define other model arguments in the model_kwargs dictionary.

BLIP-2 Deployment

HFBlip2Config is used to configure the BLIP-2 deployment.

aana.deployments.hf_blip2_deployment.HFBlip2Config

Attributes:

  • model_id (str) –

    The model ID on HuggingFace.

  • dtype (Dtype) –

    The data type. Defaults to Dtype.AUTO.

  • batch_size (int) –

    The batch size. Defaults to 1.

  • num_processing_threads (int) –

    The number of processing threads. Defaults to 1.

  • max_new_tokens (int) –

    The maximum numbers of tokens to generate. Defaults to 64.

Example Configurations

As an example, let's see how to configure the BLIP-2 deployment for the Salesforce BLIP-2 OPT-2.7b model.

BLIP-2 OPT-2.7b

from aana.core.models.types import Dtype
from aana.deployments.hf_blip2_deployment import HFBlip2Config, HFBlip2Deployment

HFBlip2Deployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25},
    user_config=HFBlip2Config(
        model_id="Salesforce/blip2-opt-2.7b",
        dtype=Dtype.FLOAT16,
        batch_size=2,
        num_processing_threads=2,
    ).model_dump(mode="json"),
)

Model is the Hugging Face model ID. We use dtype=Dtype.FLOAT16 to load the model in half-precision for faster inference and lower memory usage. batch_size and num_processing_threads are used to configure the batch size (the bigger the batch size, the more memory is required) and the number of processing threads respectively.