tts
Smallest AI text-to-speech service implementation.
This module provides a WebSocket-based integration with Smallest AI’s Waves API for real-time text-to-speech synthesis.
- class pipecat.services.smallest.tts.SmallestTTSModel(*values)[source]
Bases:
StrEnumAvailable Smallest AI TTS models.
- LIGHTNING_V3_1 = 'lightning_v3.1'
- LIGHTNING_V3_1_PRO = 'lightning_v3.1_pro'
- pipecat.services.smallest.tts.language_to_smallest_tts_language(language: Language) str[source]
Convert a Language enum to a Smallest TTS language string.
- Parameters:
language – The Language enum value to convert.
- Returns:
The corresponding Smallest language code. If
languageis not in the verified mapping, falls back to the base language code (e.g.,enfromen-US) and logs a warning (viaresolve_language(..., use_base_code=True)).
- class pipecat.services.smallest.tts.SmallestTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for SmallestTTSService.
- Parameters:
speed – Speech speed multiplier (0.5–2.0).
- speed: float | None | _NotGiven
- class pipecat.services.smallest.tts.SmallestTTSService(*, api_key: str, base_url: str = 'wss://api.smallest.ai', sample_rate: int | None = None, output_format: str = 'pcm', word_timestamps: bool = True, settings: SmallestTTSSettings | None = None, **kwargs)[source]
Bases:
InterruptibleTTSServiceSmallest AI real-time text-to-speech service using WebSocket streaming.
Provides real-time text-to-speech synthesis using Smallest AI’s WebSocket API. Supports streaming audio generation with configurable voice parameters and language settings. Handles interruptions by reconnecting the WebSocket.
Example:
tts = SmallestTTSService( api_key="your-api-key", settings=SmallestTTSService.Settings( voice="sophia", language=Language.EN, speed=1.0, ), )
- Settings
alias of
SmallestTTSSettings
- __init__(*, api_key: str, base_url: str = 'wss://api.smallest.ai', sample_rate: int | None = None, output_format: str = 'pcm', word_timestamps: bool = True, settings: SmallestTTSSettings | None = None, **kwargs)[source]
Initialize the Smallest AI WebSocket TTS service.
- Parameters:
api_key – Smallest AI API key for authentication.
base_url – Base WebSocket URL for the Smallest API.
sample_rate – Audio sample rate in Hz. If None, uses default.
output_format – Audio format returned by the API. One of
pcm,mp3,wav,ulaw,alaw. Defaults topcm, which is what Pipecat expects internally. Fixed at init time.word_timestamps – Whether to request per-word timing events, enabled by default. When
True, the server interleavesword_timestampmessages and the service emits aligned per-wordTTSTextFrame``s. Supported on base-queue English + Hindi voices (``meher,devansh,kartik,maithili,liam,avery); other voices silently emit no word events, so leaving this on is safe regardless of voice. Fixed at init time because it determines whether text frames are produced from word timing or pushed whole.settings – Runtime-updatable settings for the TTS service.
**kwargs – Additional arguments passed to parent InterruptibleTTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Smallest service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to Smallest service language format.
- Parameters:
language – The language to convert.
- Returns:
The Smallest-specific language code, or None if not supported.
- async start(frame: StartFrame)[source]
Start the Smallest TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Smallest TTS service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Smallest TTS service.
- Parameters:
frame – The cancel frame.
- async on_turn_context_created(context_id: str)[source]
Reset the word-timestamp offset at the start of each turn.
Each LLM turn gets a fresh audio context, so the per-request offset accumulated for the previous turn must not carry over.
- Parameters:
context_id – The newly created turn context ID.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Smallest’s WebSocket streaming API.
- Parameters:
text – The text to synthesize into speech.
context_id – Unique identifier for this TTS context.
- Yields:
Frame – Audio arrives via WebSocket receive task.
- async setup(setup: FrameProcessorSetup)
Set up the processor with required components.
- Parameters:
setup – Configuration object containing setup parameters.