tts

Smallest AI text-to-speech service implementation.

This module provides a WebSocket-based integration with Smallest AI’s Waves API for real-time text-to-speech synthesis.

class pipecat.services.smallest.tts.SmallestTTSModel(*values)[source]

Bases: StrEnum

Available Smallest AI TTS models.

LIGHTNING_V3_1 = 'lightning_v3.1'

LIGHTNING_V3_1_PRO = 'lightning_v3.1_pro'

pipecat.services.smallest.tts.language_to_smallest_tts_language(language: Language) → str[source]

Convert a Language enum to a Smallest TTS language string.

Parameters:: language – The Language enum value to convert.
Returns:: The corresponding Smallest language code. If language is not in the verified mapping, falls back to the base language code (e.g., en from en-US) and logs a warning (via resolve_language(..., use_base_code=True)).

Bases: TTSSettings

Settings for SmallestTTSService.

Parameters:: speed – Speech speed multiplier (0.5–2.0).

speed: float | None | _NotGiven

class pipecat.services.smallest.tts.SmallestTTSService(*, api_key: str, base_url: str = 'wss://api.smallest.ai', sample_rate: int | None = None, output_format: str = 'pcm', word_timestamps: bool = True, settings: SmallestTTSSettings | None = None, **kwargs)[source]

Bases: InterruptibleTTSService

Smallest AI real-time text-to-speech service using WebSocket streaming.

Provides real-time text-to-speech synthesis using Smallest AI’s WebSocket API. Supports streaming audio generation with configurable voice parameters and language settings. Handles interruptions by reconnecting the WebSocket.

Example:

tts = SmallestTTSService(
    api_key="your-api-key",
    settings=SmallestTTSService.Settings(
        voice="sophia",
        language=Language.EN,
        speed=1.0,
    ),
)

Settings: alias of SmallestTTSSettings

__init__(*, api_key: str, base_url: str = 'wss://api.smallest.ai', sample_rate: int | None = None, output_format: str = 'pcm', word_timestamps: bool = True, settings: SmallestTTSSettings | None = None, **kwargs)[source]

Initialize the Smallest AI WebSocket TTS service.

Parameters:

api_key – Smallest AI API key for authentication.
base_url – Base WebSocket URL for the Smallest API.
sample_rate – Audio sample rate in Hz. If None, uses default.
output_format – Audio format returned by the API. One of pcm, mp3, wav, ulaw, alaw. Defaults to pcm, which is what Pipecat expects internally. Fixed at init time.
word_timestamps – Whether to request per-word timing events, enabled by default. When True, the server interleaves word_timestamp messages and the service emits aligned per-word TTSTextFrame``s. Supported on base-queue English + Hindi voices (``meher, devansh, kartik, maithili, liam, avery); other voices silently emit no word events, so leaving this on is safe regardless of voice. Fixed at init time because it determines whether text frames are produced from word timing or pushed whole.
settings – Runtime-updatable settings for the TTS service.
**kwargs – Additional arguments passed to parent InterruptibleTTSService.

can_generate_metrics() → bool[source]

Check if this service can generate processing metrics.

Returns:: True, as Smallest service supports metrics generation.

async flush_audio(context_id: str | None = None)[source]: Flush any pending audio data.

language_to_service_language(language: Language) → str | None[source]

Convert a Language enum to Smallest service language format.

Parameters:: language – The language to convert.
Returns:: The Smallest-specific language code, or None if not supported.

async start(frame: StartFrame)[source]

Start the Smallest TTS service.

Parameters:: frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Smallest TTS service.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Smallest TTS service.

Parameters:: frame – The cancel frame.

async on_turn_context_created(context_id: str)[source]

Reset the word-timestamp offset at the start of each turn.

Each LLM turn gets a fresh audio context, so the per-request offset accumulated for the previous turn must not carry over.

Parameters:: context_id – The newly created turn context ID.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame | None, None][source]

Generate speech from text using Smallest’s WebSocket streaming API.

Parameters:

text – The text to synthesize into speech.
context_id – Unique identifier for this TTS context.

Yields:

Frame – Audio arrives via WebSocket receive task.

async setup(setup: FrameProcessorSetup)

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.