Skip to content

Speaker Recognition

Speaker Diarization (SD) Models

PyannoteSpeakerDiarizationDeployment allows you to diarize the audio for speakers audio with pyannote models. The deployment is based on the pyannote.audio library.

Tip

To use Pyannotate Speaker Diarization deployment, install required libraries with pip install pyannote-audio or include extra dependencies using pip install aana[asr].

PyannoteSpeakerDiarizationConfig is used to configure the Speaker Diarization deployment.

aana.deployments.pyannote_speaker_diarization_deployment.PyannoteSpeakerDiarizationConfig

Attributes:

  • model_id (str) –

    name of the speaker diarization pipeline.

  • sample_rate (int) –

    The sample rate of the audio. Defaults to 16000.

Accessing Gated Models

The PyAnnote speaker diarization models are gated, requiring special access. To use these models:

  1. Request Access:
    Visit the PyAnnote Speaker Diarization 3.1 model page and Pyannote Speaker Segmentation 3.0 model page on Hugging Face. Log in, fil out the forms, and request access.

  2. Approval:

    • If automatic, access is granted immediately.
    • If manual, wait for the model authors to approve your request.
  3. Set Up the SDK:
    After approval, add your Hugging Face access token to your .env file by setting the HF_TOKEN variable:

    HF_TOKEN=your_huggingface_access_token
    

    To get your Hugging Face access token, visit the Hugging Face Settings - Tokens.

Example Configurations

As an example, let's see how to configure the Pyannote Speaker Diarization deployment for the Speaker Diarization-3.1 model.

Speaker diarization-3.1

from aana.deployments.pyannote_speaker_diarization_deployment import PyannoteSpeakerDiarizationDeployment, PyannoteSpeakerDiarizationConfig

PyannoteSpeakerDiarizationDeployment.options(
    num_replicas=1,
    max_ongoing_requests=1000,
    ray_actor_options={"num_gpus": 0.05},
    user_config=PyannoteSpeakerDiarizationConfig(
        model_name=("pyannote/speaker-diarization-3.1"),
        sample_rate=16000,
    ).model_dump(mode="json"),
)

Diarized ASR

Speaker Diarization output can be combined with ASR to generate transcription with speaker information. Further details and code snippet are available in ASR model hub.