Skip to content

VAD Models

aana.core.models.vad

VadSegments

VadSegments = list[VadSegment]

List of VadSegment objects.

VadParams

Bases: BaseModel

A model for the Voice Activity Detection model parameters.

ATTRIBUTE DESCRIPTION
chunk_size

The maximum length of each vad output chunk.

TYPE: float

merge_onset

Onset to be used for the merging operation.

TYPE: float

merge_offset

"Optional offset to be used for the merging operation.

TYPE: float

VadSegment

Bases: BaseModel

Pydantic schema for Segment from Voice Activity Detection model.

ATTRIBUTE DESCRIPTION
time_interval

The start and end time of the segment

TYPE: TimeInterval

segments

smaller voiced segments within a merged vad segment

TYPE: list[tuple[float, float]]

to_whisper_dict

to_whisper_dict()

Generate dictionary with start, end and segments keys from VADSegment for faster whisper.

RETURNS DESCRIPTION
dict

Dictionary with start, end and segments keys

TYPE: dict

Source code in aana/core/models/vad.py
def to_whisper_dict(self) -> dict:
    """Generate dictionary with start, end and segments keys from VADSegment for faster whisper.

    Returns:
        dict: Dictionary with start, end and segments keys
    """
    return {
        "start": self.time_interval.start,
        "end": self.time_interval.end,
        "segments": self.segments,
    }