ASR Models¶

aana.core.models.asr ¶

AsrWord ¶

Bases: BaseModel

Pydantic schema for Word from ASR model.

ATTRIBUTE	DESCRIPTION
`word`	The word text. TYPE: `str`
`speaker`	Speaker label for the word. TYPE: `str \| None`
`time_interval`	Time interval of the word. TYPE: `TimeInterval`
`alignment_confidence`	Alignment confidence of the word, >= 0.0 and <= 1.0. TYPE: `float`

from_whisper ¶

from_whisper(whisper_word)

Convert WhisperWord to AsrWord.

PARAMETER	DESCRIPTION
`whisper_word`	The WhisperWord from faster-whisper. TYPE: `Word`

RETURNS	DESCRIPTION
`AsrWord`	The converted AsrWord. TYPE: `AsrWord`

Source code in aana/core/models/asr.py

@classmethod
def from_whisper(cls, whisper_word: "WhisperWord") -> "AsrWord":
    """Convert WhisperWord to AsrWord.

    Args:
        whisper_word (WhisperWord): The WhisperWord from faster-whisper.

    Returns:
        AsrWord: The converted AsrWord.
    """
    return cls(
        speaker=None,
        word=whisper_word.word,
        time_interval=TimeInterval(start=whisper_word.start, end=whisper_word.end),
        alignment_confidence=whisper_word.probability,
    )

AsrSegment ¶

Bases: BaseModel

Pydantic schema for Segment from ASR model.

ATTRIBUTE	DESCRIPTION
`text`	The text of the segment (transcript/translation). TYPE: `str`
`time_interval`	Time interval of the segment. TYPE: `TimeInterval`
`confidence`	Confidence of the segment. TYPE: `float \| None`
`no_speech_confidence`	Chance of being a silence segment. TYPE: `float \| None`
`words`	List of words in the segment. Default is []. TYPE: `list[AsrWord]`
`speaker`	Speaker label. Default is None. TYPE: `str \| None`

from_whisper ¶

from_whisper(whisper_segment)

Convert WhisperSegment to AsrSegment.

Source code in aana/core/models/asr.py

@classmethod
def from_whisper(cls, whisper_segment: "WhisperSegment") -> "AsrSegment":
    """Convert WhisperSegment to AsrSegment."""
    time_interval = TimeInterval(
        start=whisper_segment.start, end=whisper_segment.end
    )
    try:
        avg_logprob = whisper_segment.avg_logprob
        confidence = np.exp(avg_logprob)
    except AttributeError:
        confidence = None

    try:
        words = [AsrWord.from_whisper(word) for word in whisper_segment.words]
    except TypeError:  # "None type object is not iterable"
        words = []
    except AttributeError:  # "'StreamSegment' object has no attribute 'words'"
        words = []
    try:
        no_speech_confidence = whisper_segment.no_speech_prob
    except AttributeError:
        no_speech_confidence = None

    return cls(
        text=whisper_segment.text,
        time_interval=time_interval,
        confidence=confidence,
        no_speech_confidence=no_speech_confidence,
        words=words,
        speaker=None,
    )

AsrTranscriptionInfo ¶

Bases: BaseModel

Pydantic schema for TranscriptionInfo.

ATTRIBUTE	DESCRIPTION
`language`	Language of the transcription. TYPE: `str`
`language_confidence`	Confidence of the language detection, >= 0.0 and <= 1.0. Default is 0.0. TYPE: `float`

from_whisper ¶

from_whisper(transcription_info)

Convert WhisperTranscriptionInfo to AsrTranscriptionInfo.

PARAMETER	DESCRIPTION
`transcription_info`	The WhisperTranscriptionInfo from faster-whisper. TYPE: `TranscriptionInfo`

RETURNS	DESCRIPTION
`AsrTranscriptionInfo`	The converted AsrTranscriptionInfo. TYPE: `AsrTranscriptionInfo`

Source code in aana/core/models/asr.py

@classmethod
def from_whisper(
    cls, transcription_info: "WhisperTranscriptionInfo"
) -> "AsrTranscriptionInfo":
    """Convert WhisperTranscriptionInfo to AsrTranscriptionInfo.

    Args:
        transcription_info (WhisperTranscriptionInfo): The WhisperTranscriptionInfo from faster-whisper.

    Returns:
        AsrTranscriptionInfo: The converted AsrTranscriptionInfo.
    """
    return cls(
        language=transcription_info.language,
        language_confidence=transcription_info.language_probability,
    )

AsrTranscription ¶

Bases: BaseModel

Pydantic schema for Transcription/Translation.

ATTRIBUTE	DESCRIPTION
`text`	The text of the transcription/translation. Default is "". TYPE: `str`