Skip to content

Models

Media Models

Aana SDK provides models for such media types as audio, video, and images. These models make it easy to work with media files, download them, convert from other formats, and more.

Don't use models above in the Endpoint definition (as request or response body) because they are not serializable to JSON. Instead, use the following models as input models and call convert_input_to_object() method to convert them to the appropriate media type.

For example, in the following code snippet, the ImageInput model is used in the endpoint definition, and then it is converted to the Image object.

class ImageClassificationEndpoint(Endpoint):
    async def run(self, image: ImageInput) -> ImageClassificationOutput:
        image_obj: Image = image.convert_input_to_object()
        ...

Automatic Speech Recognition (ASR) Models

Models for working with automatic speech recognition (ASR) models. These models represent the output of ASR model like whisper and represent the transcription, segments, and words etc.

Caption Models

Models for working with captions. These models represent the output of image captioning models like BLIP 2.

Chat Models

Models for working with chat models. These models represent the input and output of chat models and models for OpenAI-compatible API.

Image Chat Models

Models for working with visual chat models. These models represent the input of chat models containing text and image input for describing visual content of vision-language model (VLM).

Custom Config

Custom Config model can be used to pass arbitrary configuration to the deployment.

Sampling Models

Contains Sampling Parameters model which can be used to pass sampling parameters to the LLM models.

Time Models

Contains time models like TimeInterval.

Types Models

Contains types models like Dtype.

VAD Models

Contains Voice Activity Detection (VAD) models like VadParams, VadSegment, and VadSegments.

Video Models

Contains video models like VideoMetadata, VideoStatus, and VideoParams.

Whisper Models

Contains models for working with whisper models like WhisperParams.