ui_tools
Opt-in tool mixin for UIWorker.
Ships ReplyToolMixin: a single bundled reply tool (a required spoken
answer plus the standard UI actions) covering the common app shapes, for
subclasses that don’t need a custom tool schema. See the class for details.
- class pipecat.workers.ui.ui_tools.ReplyToolMixin[source]
Bases:
objectExpose a
replytool covering the full standard action set.Single bundled LLM tool with a required spoken
answerplus optional visual and state-changing actions. One tool call per turn, no chaining; the requiredanswerargument is enforced by the API schema so the model cannot omit the terminator.Compose alongside
UIWorker:class MyUIWorker(ReplyToolMixin, UIWorker): ...
Covers pointing apps (
scroll_to+highlight), reading apps (scroll_to+select_text), form apps (fills+click), and any blend (e.g. a document review with selection-based deixis AND voice-driven note-taking). The LLM uses whichever fields fit the user’s request per turn; unused fields staynulland don’t affect behavior.Delivers
answeras verbatim TTS (respond_to_job(answer, tts_speak=True)) – the worker speaks the exact phrase. Apps that want a minimal schema (only the fields actually used, or app-specific commands), or that want the requester’s voice LLM to phrase the reply instead, write their own@tool replyon theUIWorkersubclass directly. Use the helper methods onUIWorkerplussend_commandto dispatch the underlying UI commands.The host class must provide
scroll_to,highlight,select_text,click,set_input_value, andrespond_to_job(UIWorkerdoes) and must be the target of@tooldiscovery on the LLM pipeline.- async reply(params: FunctionCallParams, answer: str, scroll_to: str | None = None, highlight: list[str] | None = None, select_text: str | None = None, fills: list[dict] | None = None, click: list[str] | None = None)[source]
Reply to the user. Optionally point at content and act on inputs.
Always called exactly once per turn.
answeris required; the action fields are optional and may be combined.Visual / pointing actions (draw the user’s attention):
scroll_tobrings an element into view (single ref).highlightflashes elements briefly (list of refs). Best for short emphasis like a button or a fact.select_textputs the page’s text selection on an element (single ref). Best for “this paragraph” / “the section about X” so the user sees exactly what was meant. Persists until the user clicks elsewhere.
State-changing actions (modify form / app state):
fillswrites values into inputs (list of{"ref", "value"}objects, multi-fill in one turn).clickclicks elements (list of refs in order). Use for checkboxes, radios, submit buttons.
Order of dispatch within a turn:
scroll_to, thenhighlight, thenselect_text, thenfills, thenclick, then speak the answer.- Parameters:
params – Framework-provided tool invocation context.
answer – The spoken reply in plain language. One short sentence. No markdown, no symbols.
scroll_to – Optional snapshot ref. Scrolls the element into view before speaking.
highlight – Optional list of snapshot refs. Visually pulses each element.
select_text – Optional snapshot ref. Places the page’s text selection on that element.
fills – Optional list of
{"ref": "eN", "value": "..."}objects. Writes each value into the input atref.click – Optional list of snapshot refs to click in order.