Skip to content

Aana SDK

Models

mobiusml/aana_sdk

Models¶

Media Models ¶

Aana SDK provides models for such media types as audio, video, and images. These models make it easy to work with media files, download them, convert from other formats, and more.

Don't use models above in the Endpoint definition (as request or response body) because they are not serializable to JSON. Instead, use the following models as input models and call convert_input_to_object() method to convert them to the appropriate media type.

For example, in the following code snippet, the ImageInput model is used in the endpoint definition, and then it is converted to the Image object.

class ImageClassificationEndpoint(Endpoint):
    async def run(self, image: ImageInput) -> ImageClassificationOutput:
        image_obj: Image = image.convert_input_to_object()
        ...

Automatic Speech Recognition (ASR) Models ¶

Models for working with automatic speech recognition (ASR) models. These models represent the output of ASR model like whisper and represent the transcription, segments, and words etc.

Caption Models ¶

Models for working with captions. These models represent the output of image captioning models like BLIP 2.

Chat Models ¶

Models for working with chat models. These models represent the input and output of chat models and models for OpenAI-compatible API.

Image Chat Models ¶

Models for working with visual chat models. These models represent the input of chat models containing text and image input for describing visual content of vision-language model (VLM).

Custom Config ¶

Custom Config model can be used to pass arbitrary configuration to the deployment.

Sampling Models ¶

Contains Sampling Parameters model which can be used to pass sampling parameters to the LLM models.

Time Models ¶

Contains time models like TimeInterval.

Types Models ¶

Contains types models like Dtype.

VAD Models ¶

Contains Voice Activity Detection (VAD) models like VadParams, VadSegment, and VadSegments.

Video Models ¶

Contains video models like VideoMetadata, VideoStatus, and VideoParams.

Whisper Models ¶

Contains models for working with whisper models like WhisperParams.