stt
AWS Transcribe Speech-to-Text service implementation.
This module provides a WebSocket-based connection to AWS Transcribe for real-time speech-to-text transcription with support for multiple languages and audio formats.
- class pipecat.services.aws.stt.AWSTranscribeSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for AWSTranscribeSTTService.
- class pipecat.services.aws.stt.AWSTranscribeSTTService(*, api_key: str | None = None, aws_access_key_id: str | None = None, aws_session_token: str | None = None, region: str | None = None, sample_rate: int | None = None, language: Language | None = None, settings: AWSTranscribeSTTSettings | None = None, ttfs_p99_latency: float | None = 1.9, **kwargs)[source]
Bases:
WebsocketSTTServiceAWS Transcribe Speech-to-Text service using WebSocket streaming.
Provides real-time speech transcription using AWS Transcribe’s streaming API. Supports multiple languages, configurable sample rates, and both interim and final transcription results.
- Settings
alias of
AWSTranscribeSTTSettings
- __init__(*, api_key: str | None = None, aws_access_key_id: str | None = None, aws_session_token: str | None = None, region: str | None = None, sample_rate: int | None = None, language: Language | None = None, settings: AWSTranscribeSTTSettings | None = None, ttfs_p99_latency: float | None = 1.9, **kwargs)[source]
Initialize the AWS Transcribe STT service.
- Parameters:
api_key – AWS secret access key. If None, falls back to environment variables and the default boto3 credential chain (instance profiles, IRSA, ECS task roles, SSO, etc.).
aws_access_key_id – AWS access key ID. Same fallback behaviour as
api_key.aws_session_token – AWS session token for temporary credentials.
region – AWS region for the service.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate. AWS Transcribe only supports 8000 or 16000 Hz; other values are clamped to 16000 Hz at connect time.
language –
Language for transcription.
Deprecated since version 0.0.105: Use
settings=AWSTranscribeSTTService.Settings(language=...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to parent STTService class.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as AWS Transcribe STT supports metrics generation.
- get_service_encoding(encoding: str) str[source]
Convert internal encoding format to AWS Transcribe format.
- Parameters:
encoding – Internal encoding format string.
- Returns:
AWS Transcribe compatible encoding format.
- async start(frame: StartFrame)[source]
Initialize the connection when the service starts.
- Parameters:
frame – Start frame signaling service initialization.
- async stop(frame: EndFrame)[source]
Stop the service and disconnect from AWS Transcribe.
- Parameters:
frame – End frame signaling service shutdown.
- async cancel(frame: CancelFrame)[source]
Cancel the service and disconnect from AWS Transcribe.
- Parameters:
frame – Cancel frame signaling service cancellation.
- async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]
Process audio data and send to AWS Transcribe.
- Parameters:
audio – Raw audio bytes to transcribe.
- Yields:
ErrorFrame – If processing fails or connection issues occur.
- language_to_service_language(language: Language) str | None[source]
Convert internal language enum to AWS Transcribe language code.
Source: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html All language codes that support streaming are included.
- Parameters:
language – Internal language enumeration value.
- Returns:
AWS Transcribe compatible language code, or None if unsupported.
- async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)
Push a frame downstream, tracking TranscriptionFrame timestamps for TTFB.
Stores the timestamp of each TranscriptionFrame for TTFB calculation. If the frame is marked as finalized (via request_finalize/confirm_finalize), reports TTFB immediately and cancels any pending timeout. Otherwise, TTFB is reported after a timeout.
- Parameters:
frame – The frame to push.
direction – The direction to push the frame.
- async stop_ttfb_metrics(*, end_time: float | None = None)
Stop time-to-first-byte metrics collection and push results.
- Parameters:
end_time – Optional timestamp to use as the end time. If None, uses the current time.