Skip to content

ASR Models

aana.core.models.asr

AsrWord

Bases: BaseModel

Pydantic schema for Word from ASR model.

ATTRIBUTE DESCRIPTION
word

The word text.

TYPE: str

speaker

Speaker label for the word.

TYPE: str | None

time_interval

Time interval of the word.

TYPE: TimeInterval

alignment_confidence

Alignment confidence of the word, >= 0.0 and <= 1.0.

TYPE: float

from_whisper

from_whisper(whisper_word)

Convert WhisperWord to AsrWord.

PARAMETER DESCRIPTION
whisper_word

The WhisperWord from faster-whisper.

TYPE: Word

RETURNS DESCRIPTION
AsrWord

The converted AsrWord.

TYPE: AsrWord

Source code in aana/core/models/asr.py
@classmethod
def from_whisper(cls, whisper_word: "WhisperWord") -> "AsrWord":
    """Convert WhisperWord to AsrWord.

    Args:
        whisper_word (WhisperWord): The WhisperWord from faster-whisper.

    Returns:
        AsrWord: The converted AsrWord.
    """
    return cls(
        speaker=None,
        word=whisper_word.word,
        time_interval=TimeInterval(start=whisper_word.start, end=whisper_word.end),
        alignment_confidence=whisper_word.probability,
    )

AsrSegment

Bases: BaseModel

Pydantic schema for Segment from ASR model.

ATTRIBUTE DESCRIPTION
text

The text of the segment (transcript/translation).

TYPE: str

time_interval

Time interval of the segment.

TYPE: TimeInterval

confidence

Confidence of the segment.

TYPE: float | None

no_speech_confidence

Chance of being a silence segment.

TYPE: float | None

words

List of words in the segment. Default is [].

TYPE: list[AsrWord]

speaker

Speaker label. Default is None.

TYPE: str | None

from_whisper

from_whisper(whisper_segment)

Convert WhisperSegment to AsrSegment.

Source code in aana/core/models/asr.py
@classmethod
def from_whisper(cls, whisper_segment: "WhisperSegment") -> "AsrSegment":
    """Convert WhisperSegment to AsrSegment."""
    time_interval = TimeInterval(
        start=whisper_segment.start, end=whisper_segment.end
    )
    try:
        avg_logprob = whisper_segment.avg_logprob
        confidence = np.exp(avg_logprob)
    except AttributeError:
        confidence = None

    try:
        words = [AsrWord.from_whisper(word) for word in whisper_segment.words]
    except TypeError:  # "None type object is not iterable"
        words = []
    except AttributeError:  # "'StreamSegment' object has no attribute 'words'"
        words = []
    try:
        no_speech_confidence = whisper_segment.no_speech_prob
    except AttributeError:
        no_speech_confidence = None

    return cls(
        text=whisper_segment.text,
        time_interval=time_interval,
        confidence=confidence,
        no_speech_confidence=no_speech_confidence,
        words=words,
        speaker=None,
    )

AsrTranscriptionInfo

Bases: BaseModel

Pydantic schema for TranscriptionInfo.

ATTRIBUTE DESCRIPTION
language

Language of the transcription.

TYPE: str

language_confidence

Confidence of the language detection, >= 0.0 and <= 1.0. Default is 0.0.

TYPE: float

from_whisper

from_whisper(transcription_info)

Convert WhisperTranscriptionInfo to AsrTranscriptionInfo.

PARAMETER DESCRIPTION
transcription_info

The WhisperTranscriptionInfo from faster-whisper.

TYPE: TranscriptionInfo

RETURNS DESCRIPTION
AsrTranscriptionInfo

The converted AsrTranscriptionInfo.

TYPE: AsrTranscriptionInfo

Source code in aana/core/models/asr.py
@classmethod
def from_whisper(
    cls, transcription_info: "WhisperTranscriptionInfo"
) -> "AsrTranscriptionInfo":
    """Convert WhisperTranscriptionInfo to AsrTranscriptionInfo.

    Args:
        transcription_info (WhisperTranscriptionInfo): The WhisperTranscriptionInfo from faster-whisper.

    Returns:
        AsrTranscriptionInfo: The converted AsrTranscriptionInfo.
    """
    return cls(
        language=transcription_info.language,
        language_confidence=transcription_info.language_probability,
    )

AsrTranscription

Bases: BaseModel

Pydantic schema for Transcription/Translation.

ATTRIBUTE DESCRIPTION
text

The text of the transcription/translation. Default is "".

TYPE: str