tts
xAI text-to-speech service implementation.
Provides two TTS services against xAI’s voice API:
XAIHttpTTSServiceuses the batch HTTP endpoint athttps://api.x.ai/v1/tts.XAITTSServiceuses the streaming WebSocket endpoint atwss://api.x.ai/v1/tts.
See https://docs.x.ai/developers/rest-api-reference/inference/voice.
- pipecat.services.xai.tts.language_to_xai_language(language: Language) str[source]
Convert a Language enum to xAI language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
The corresponding service language code. If
languageis not in the verified mapping, falls back to the base language code (e.g.,enfromen-US) and logs a warning (viaresolve_language(..., use_base_code=True)).
- class pipecat.services.xai.tts.XAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for XAIHttpTTSService.
- class pipecat.services.xai.tts.XAIHttpTTSService(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]
Bases:
TTSServicexAI HTTP text-to-speech service.
The service requests raw PCM audio so emitted
TTSAudioRawFrameobjects match Pipecat’s downstream expectations without extra decoding.- Settings
alias of
XAITTSSettings
- __init__(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]
Initialize the xAI TTS service.
- Parameters:
api_key – xAI API key for authentication.
base_url – xAI TTS endpoint. Defaults to
https://api.x.ai/v1/tts.sample_rate – Audio sample rate. If None, uses default.
encoding – Output encoding format. Defaults to “pcm”.
aiohttp_session – Optional shared aiohttp session.
settings – Runtime-updatable settings.
**kwargs – Additional keyword arguments passed to
TTSService.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to xAI language format.
- Parameters:
language – The language to convert.
- Returns:
The xAI-specific language code, or None if not supported.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using xAI’s TTS API.
- async setup(setup: FrameProcessorSetup)
Set up the processor with required components.
- Parameters:
setup – Configuration object containing setup parameters.
- class pipecat.services.xai.tts.XAIWebsocketTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for XAITTSService (WebSocket streaming).
- class pipecat.services.xai.tts.XAITTSService(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]
Bases:
InterruptibleTTSServicexAI streaming text-to-speech service.
Connects to xAI’s WebSocket TTS endpoint and streams audio chunks back as they are synthesized. Text can be sent incrementally via
text.deltamessages and each utterance is terminated withtext.done. The server responds withaudio.deltachunks followed by anaudio.donemessage.Audio parameters (voice, language, codec, sample rate, bit rate) are passed as query string parameters on the WebSocket URL; changing any of them at runtime reconnects the WebSocket.
- Settings
alias of
XAIWebsocketTTSSettings
- __init__(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]
Initialize the xAI WebSocket TTS service.
- Parameters:
api_key – xAI API key for authentication.
base_url – xAI TTS WebSocket endpoint. Defaults to
wss://api.x.ai/v1/tts.sample_rate – Output audio sample rate in Hz. If None, uses the pipeline default.
codec – Output audio codec. One of
pcm,wav,mulaw,alaw. Defaults topcmso emittedTTSAudioRawFrameobjects need no decoding downstream.settings – Runtime-updatable settings.
**kwargs – Additional arguments passed to parent
InterruptibleTTSService.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to xAI language format.
- async start(frame: StartFrame)[source]
Start the xAI WebSocket TTS service.
- async cancel(frame: CancelFrame)[source]
Cancel the xAI WebSocket TTS service.
- async flush_audio(context_id: str | None = None)[source]
Signal end-of-utterance so xAI begins synthesizing what it has buffered.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate TTS audio from text using xAI’s streaming WebSocket API.
- async setup(setup: FrameProcessorSetup)
Set up the processor with required components.
- Parameters:
setup – Configuration object containing setup parameters.