llm

OpenAI Realtime LLM service implementation with WebSocket support.

class pipecat.services.openai.realtime.llm.CurrentAudioResponse(item_id: str, content_index: int, start_time_ms: int, total_size: int = 0, response_id: str = '')[source]

Bases: object

Tracks the current audio response from the assistant.

Parameters:
  • item_id – Unique identifier for the audio response item.

  • content_index – Index of the audio content within the item.

  • start_time_ms – Timestamp when the audio response started in milliseconds.

  • total_size – Total size of audio data received in bytes. Defaults to 0.

  • response_id – ID of the server response the item belongs to. Defaults to “”.

item_id: str
content_index: int
start_time_ms: int
total_size: int = 0
response_id: str = ''
class pipecat.services.openai.realtime.llm.OpenAIRealtimeLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, session_properties: SessionProperties | _NotGiven = <factory>)[source]

Bases: LLMSettings

Settings for OpenAIRealtimeLLMService.

Parameters:

session_properties – OpenAI Realtime session properties (modalities, audio config, tools, etc.). model and instructions are synced bidirectionally with the top-level model and system_instruction fields.

session_properties: SessionProperties | _NotGiven
apply_update(delta: OpenAIRealtimeLLMSettings) dict[str, Any][source]

Merge a delta, keeping model/system_instruction in sync with SP.

When the delta contains session_properties, it replaces the stored SP wholesale (matching legacy behaviour). Top-level field values always take precedence over conflicting SP values.

classmethod from_mapping(settings: Mapping[str, Any]) OpenAIRealtimeLLMSettings[source]

Build a delta from a plain dict, routing SP keys into session_properties.

Keys that correspond to SessionProperties fields (except model) are collected into a nested session_properties value. model is always routed to the top-level field. Unknown keys go to extra.

class pipecat.services.openai.realtime.llm.OpenAIRealtimeLLMService(*, api_key: str, model: str | None = None, base_url: str = 'wss://api.openai.com/v1/realtime', session_properties: SessionProperties | None = None, settings: OpenAIRealtimeLLMSettings | None = None, start_audio_paused: bool = False, start_video_paused: bool = False, video_frame_detail: str = 'auto', user_audio_preroll_secs: float | None = None, **kwargs)[source]

Bases: LLMService[OpenAIRealtimeLLMAdapter]

OpenAI Realtime LLM service providing real-time audio and text communication.

Implements the OpenAI Realtime API with WebSocket communication for low-latency bidirectional audio and text interactions. Supports function calling, conversation management, and real-time transcription.

Emits UserStartedSpeakingFrame / UserStoppedSpeakingFrame from OpenAI’s server-side VAD events, so pipeline processors that depend on those frames (RTVI client speech events, TurnTrackingObserver, AudioBufferProcessor turn recording, UserIdleController, user mute strategies, voicemail detector) work out of the box. Pair with LLMContextAggregatorPair(..., realtime_service_mode=True) so context writes are decoupled from those frames; see the examples/realtime/realtime-openai.py example.

If you wire local VAD (LLMUserAggregatorParams.vad_analyzer) on top of this service, disable OpenAI’s server-side turn detection first (turn_detection=False); otherwise both sources broadcast duplicate user-turn frames. See examples/realtime/realtime-openai-locally-driven-turns.py.

Settings

alias of OpenAIRealtimeLLMSettings

adapter_class

alias of OpenAIRealtimeLLMAdapter

__init__(*, api_key: str, model: str | None = None, base_url: str = 'wss://api.openai.com/v1/realtime', session_properties: SessionProperties | None = None, settings: OpenAIRealtimeLLMSettings | None = None, start_audio_paused: bool = False, start_video_paused: bool = False, video_frame_detail: str = 'auto', user_audio_preroll_secs: float | None = None, **kwargs)[source]

Initialize the OpenAI Realtime LLM service.

Parameters:
  • api_key – OpenAI API key for authentication.

  • model

    OpenAI model name.

    Deprecated since version 0.0.105: Use settings=OpenAIRealtimeLLMService.Settings(model=...) instead.

    This is a connection-level parameter set via the WebSocket URL query parameter and cannot be changed during the session.

  • base_url – WebSocket base URL for the realtime API. Defaults to “wss://api.openai.com/v1/realtime”.

  • session_properties

    Configuration properties for the realtime session. If None, uses default SessionProperties.

    Deprecated since version 0.0.105: Use settings=OpenAIRealtimeLLMService.Settings(session_properties=...) instead.

  • settings – Runtime-updatable settings for this service.

  • start_audio_paused – Whether to start with audio input paused. Defaults to False.

  • start_video_paused – Whether to start with video input paused. Defaults to False.

  • video_frame_detail – Detail level for video processing. Can be “auto”, “low”, or “high”. This sets the image_detail parameter in the OpenAI Realtime API. “auto” lets the model decide, “low” is faster and uses fewer tokens, “high” provides more detail. Defaults to “auto”.

  • user_audio_preroll_secs – In manual turn-detection mode (turn_detection=False, locally-driven turns), how much recent audio to replay after an interruption clears the input audio buffer, so the speech onset isn’t lost. Defaults to None: auto-sized to the upstream VAD’s start_secs plus a small margin, falling back to DEFAULT_USER_AUDIO_PREROLL_SECS when no VAD is present. Auto-sizing assumes VAD drives turn starts (the default VADUserTurnStartStrategy); set this explicitly if you use a non-VAD turn-start strategy. No effect when server-side turn detection is enabled.

  • **kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() bool[source]

Check if the service can generate usage metrics.

Returns:

True if metrics generation is supported.

set_audio_input_paused(paused: bool)[source]

Set whether audio input is paused.

Parameters:

paused – True to pause audio input, False to resume.

set_video_input_paused(paused: bool)[source]

Set whether video input is paused.

Parameters:

paused – True to pause video input, False to resume.

set_video_frame_detail(detail: str)[source]

Set the detail level for video processing.

Parameters:

detail – Detail level - “auto”, “low”, or “high”.

async retrieve_conversation_item(item_id: str)[source]

Retrieve a conversation item by ID from the server.

Parameters:

item_id – The ID of the conversation item to retrieve.

Returns:

The retrieved conversation item.

async start(frame: StartFrame)[source]

Start the service and establish WebSocket connection.

Parameters:

frame – The start frame triggering service initialization.

async stop(frame: EndFrame)[source]

Stop the service and close WebSocket connection.

Parameters:

frame – The end frame triggering service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the service and close WebSocket connection.

Parameters:

frame – The cancel frame triggering service cancellation.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames from the pipeline.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame flow in the pipeline.

async send_client_event(event: ClientEvent)[source]

Send a client event to the OpenAI Realtime API.

Parameters:

event – The client event to send.

async handle_evt_input_audio_transcription_completed(evt)[source]

Handle completion of input audio transcription.

Parameters:

evt – The transcription completed event.

async reset_conversation()[source]

Reset the conversation by disconnecting and reconnecting.

This is the safest way to start a new conversation. Note that this will fail if called from the receive task.

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)

Pushes a frame.

Parameters:
  • frame – The frame to push.

  • direction – The direction of frame pushing.

async stop_ttfb_metrics(*, end_time: float | None = None)

Stop time-to-first-byte metrics collection and push results.

Parameters:

end_time – Optional timestamp to use as the end time. If None, uses the current time.