llm
Inception LLM service implementation using OpenAI-compatible interface.
- class pipecat.services.inception.llm.InceptionLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven | NotGiven = <factory>, max_tokens: int | None | _NotGiven | NotGiven = <factory>, top_p: float | None | _NotGiven | NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven | NotGiven = <factory>, presence_penalty: float | None | _NotGiven | NotGiven = <factory>, seed: int | None | _NotGiven | NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, max_completion_tokens: int | None | _NotGiven | NotGiven = <factory>, reasoning_effort: Literal['instant', 'low', 'medium', 'high'] | None | ~pipecat.services.settings._NotGiven=<factory>, realtime: bool | None | _NotGiven = <factory>)[source]
Bases:
OpenAILLMSettingsSettings for InceptionLLMService.
- Parameters:
reasoning_effort – Controls how much reasoning the model applies. One of “instant”, “low”, “medium”, or “high”. When unset, the parameter is omitted and Inception’s server-side default applies.
realtime – When True, reduces time to first diffusion block (TTFT). Defaults to True.
- reasoning_effort: Literal['instant', 'low', 'medium', 'high'] | None | _NotGiven
- realtime: bool | None | _NotGiven
- class pipecat.services.inception.llm.InceptionLLMService(*, api_key: str, base_url: str = 'https://api.inceptionlabs.ai/v1', settings: InceptionLLMSettings | None = None, **kwargs)[source]
Bases:
OpenAILLMServiceA service for interacting with Inception’s API using the OpenAI-compatible interface.
This service extends OpenAILLMService to connect to Inception’s API endpoint while maintaining full compatibility with OpenAI’s interface and functionality. Supports Mercury-2, Inception’s diffusion-based reasoning model.
- supports_developer_role = False
Whether this service’s API supports the “developer” message role.
OpenAI’s native API supports it, but some OpenAI-compatible services (e.g. Cerebras) do not. Subclasses that don’t support it should set this to
False, which causes the adapter to convert “developer” messages to “user” messages before sending them to the API.
- Settings
alias of
InceptionLLMSettings
- __init__(*, api_key: str, base_url: str = 'https://api.inceptionlabs.ai/v1', settings: InceptionLLMSettings | None = None, **kwargs)[source]
Initialize the Inception LLM service.
- Parameters:
api_key – The API key for accessing Inception’s API.
base_url – The base URL for Inception API. Defaults to “https://api.inceptionlabs.ai/v1”.
settings – Runtime-updatable settings.
**kwargs – Additional keyword arguments passed to OpenAILLMService.
- create_client(api_key=None, base_url=None, **kwargs)[source]
Create OpenAI-compatible client for Inception API endpoint.
- Parameters:
api_key – The API key for authentication. If None, uses instance default.
base_url – The base URL for the API. If None, uses instance default.
**kwargs – Additional keyword arguments for client configuration.
- Returns:
An OpenAI-compatible client configured for Inception’s API.
- build_chat_completion_params(params_from_context: OpenAILLMInvocationParams) dict[source]
Build parameters for Inception chat completion request.
Extends the base OpenAI parameters with Inception-specific options such as reasoning_effort and realtime.
- Parameters:
params_from_context – Parameters, derived from the LLM context, to use for the chat completion. Contains messages, tools, and tool choice.
- Returns:
Dictionary of parameters for the chat completion request.