tts

xAI text-to-speech service implementation.

Provides two TTS services against xAI’s voice API:

XAIHttpTTSService uses the batch HTTP endpoint at https://api.x.ai/v1/tts.
XAITTSService uses the streaming WebSocket endpoint at wss://api.x.ai/v1/tts.

See https://docs.x.ai/developers/rest-api-reference/inference/voice.

pipecat.services.xai.tts.language_to_xai_language(language: Language) → str[source]

Convert a Language enum to xAI language code.

Parameters:: language – The Language enum value to convert.
Returns:: The corresponding service language code. If language is not in the verified mapping, falls back to the base language code (e.g., en from en-US) and logs a warning (via resolve_language(..., use_base_code=True)).

Bases: TTSSettings

Settings for XAIHttpTTSService.

Parameters:

speed – Speech speed multiplier from 0.7 to 1.5 (1.0 is normal).
optimize_streaming_latency – Latency optimization level (0, 1, or 2).
text_normalization – Whether to normalize text before synthesis.

speed: float | None | _NotGiven

optimize_streaming_latency: int | None | _NotGiven

text_normalization: bool | None | _NotGiven

class pipecat.services.xai.tts.XAIHttpTTSService(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]

Bases: TTSService

xAI HTTP text-to-speech service.

The service requests raw PCM audio so emitted TTSAudioRawFrame objects match Pipecat’s downstream expectations without extra decoding.

Settings: alias of XAITTSSettings

__init__(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]

Initialize the xAI TTS service.

Parameters:

api_key – xAI API key for authentication.
base_url – xAI TTS endpoint. Defaults to https://api.x.ai/v1/tts.
sample_rate – Audio sample rate. If None, uses default.
encoding – Output encoding format. Defaults to “pcm”.
aiohttp_session – Optional shared aiohttp session.
settings – Runtime-updatable settings.
**kwargs – Additional keyword arguments passed to TTSService.

can_generate_metrics() → bool[source]: Check if this service can generate processing metrics.

language_to_service_language(language: Language) → str | None[source]

Convert a Language enum to xAI language format.

Parameters:: language – The language to convert.
Returns:: The xAI-specific language code, or None if not supported.

async start(frame)[source]: Start the xAI TTS service.

async stop(frame)[source]: Stop the xAI TTS service.

async cancel(frame)[source]: Cancel the xAI TTS service.

async cleanup()[source]: Release xAI TTS resources at teardown.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame | None, None][source]: Generate speech from text using xAI’s TTS API.

async setup(setup: FrameProcessorSetup)

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.

Bases: TTSSettings

Settings for XAITTSService (WebSocket streaming).

Parameters:

speed – Speech speed multiplier from 0.7 to 1.5 (1.0 is normal).
optimize_streaming_latency – Latency optimization level (0, 1, or 2).
text_normalization – Whether to normalize text before synthesis.
with_timestamps – Whether to request character timings. When enabled, the service converts them into per-word TTSTextFrame objects.

speed: float | None | _NotGiven

optimize_streaming_latency: int | None | _NotGiven

text_normalization: bool | None | _NotGiven

with_timestamps: bool | None | _NotGiven

class pipecat.services.xai.tts.XAITTSService(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]

Bases: WebsocketTTSService

xAI streaming text-to-speech service.

Connects to xAI’s WebSocket TTS endpoint and streams audio chunks back as they are synthesized. Text can be sent incrementally via text.delta messages and each utterance is terminated with text.done. The server responds with audio.delta chunks followed by an audio.done message.

Audio parameters (voice, language, codec, sample rate) are passed as query string parameters on the WebSocket URL; changing any of them at runtime reconnects the WebSocket. With with_timestamps enabled, xAI’s per-character timings are converted into per-word TTSTextFrame objects.

Note that xAI delivers timestamps in batches that are decoupled from the audio stream (a batch can cover several seconds of speech and arrive in one message), so the word TTSTextFrame objects are pushed in bursts rather than evenly spread across playback. Each frame still carries an accurate pts, so consumers should schedule off pts rather than arrival time.

Settings: alias of XAIWebsocketTTSSettings

__init__(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]

Initialize the xAI WebSocket TTS service.

Parameters:

api_key – xAI API key for authentication.
base_url – xAI TTS WebSocket endpoint. Defaults to wss://api.x.ai/v1/tts.
sample_rate – Output audio sample rate in Hz. If None, uses the pipeline default.
codec – Output audio codec. One of pcm, wav, mulaw, alaw. Defaults to pcm so emitted TTSAudioRawFrame objects need no decoding downstream.
settings – Runtime-updatable settings.
**kwargs – Additional arguments passed to parent WebsocketTTSService.

can_generate_metrics() → bool[source]: Check if this service can generate processing metrics.

language_to_service_language(language: Language) → str | None[source]: Convert a Language enum to xAI language format.

async start(frame: StartFrame)[source]: Start the xAI WebSocket TTS service.

async flush_audio(context_id: str | None = None)[source]: Signal end-of-utterance so xAI begins synthesizing what it has buffered.

async on_audio_context_interrupted(context_id: str)[source]: Cancel the current xAI utterance on barge-in without reconnecting.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame | None, None][source]: Generate TTS audio from text using xAI’s streaming WebSocket API.

async setup(setup: FrameProcessorSetup)

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.