task Natural-language goal for the browser session to complete.
Agent Actions is our upcoming API for autonomous browser automation. Just pass in a natural language task and a URL. We'll execute the steps, stream the progress in real-time, and return the structured data you asked for.
API sketch, subject to change. Not a live contract yet.
Stop writing brittle CSS selectors. Just describe what you want the browser to accomplish, set your boundaries, and listen to the progress stream.
{
"task": "Find order BC-1042 and return its delivery date.",
"url": "https://example-shop.test/orders",
"options": {
"maxSteps": 12,
"humanIntervention": "allowed",
"secretValues": { "crmApiKey": "{{ vault ref }}" },
"outputSchema": {
"orderId": "string",
"status": "string",
"deliveryDate": "string"
}
}
} You control the execution. Inject secrets securely, define the exact JSON schema you want back, and decide if the agent is allowed to ask humans for help when it gets stuck.
task Natural-language goal for the browser session to complete.
url | sessionId Start from a URL or attach the task to an existing BrowserCity session.
options.maxSteps Bound how many observe-plan-act loops the task can spend.
options.humanIntervention Allow a human handoff event when the task needs approval, credentials, or judgment.
options.detectElements Ask the agent to enrich observations with interactable element hints.
options.vision Enable visual observations for pages where DOM text is not enough.
options.agent Select the planned task-runner profile when multiple agents are available.
options.extendedSystemMessage Add task-specific steering that should not live in the reusable prompt.
options.secretValues Planned task-scoped secrets: redacted from stream events and cleaned up after completion.
options.outputSchema Optional JSON Schema for structured final output.
event: agent_action.started
data: {"id":"act_01j...","status":"running"}
event: agent_action.step
data: {"index":1,"intent":"Open order lookup"}
event: agent_action.output
data: {"orderId":"BC-1042","deliveryDate":"2026-06-02"}
event: agent_action.completed
data: {"status":"completed","steps":8} Every navigation, observation, and action is streamed back via Server-Sent Events (SSE). Hook directly into your UI or logging system without writing polling loops.
agent_action.startedagent_action.stepagent_action.observationagent_action.human_interventionagent_action.outputagent_action.completedagent_action.failed
Use Humanized REST /v1/do/* for shipped browser actions like open, navigate, click, type, extract markdown, capture screenshots, and close sessions.
Start for free. No credit card required. Private sessions by default.