<< All versions
Skill v1.0.0
currentAutomated scan100/100inclusionai/aworld/agent-browser
──Details
PublishedJune 22, 2026 at 02:44 PM
Content Hashsha256:d1af996f1f0339d9...
Git SHAe71aaa6ed42e
──Files
Files (1 file, 8.6 KB)
SKILL.md8.6 KBactive
SKILL.md · 254 lines · 8.6 KB
version: "1.0.0" name: agent-browser description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. allowed-tools: Bash(agent-browser:*)
Browser Automation with agent-browser
Quick start
bash
agent-browser open <url> # Navigate to pageagent-browser snapshot -i # Get interactive elements with refsagent-browser click @e1 # Click element by refagent-browser fill @e2 "text" # Fill input by refagent-browser close # Close browser
Core workflow
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(returns elements with refs like@e1,@e2) - Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
bash
agent-browser open <url> # Navigate to URLagent-browser back # Go backagent-browser forward # Go forwardagent-browser reload # Reload pageagent-browser close # Close browser
Snapshot (page analysis)
bash
agent-browser snapshot # Full accessibility treeagent-browser snapshot -i # Interactive elements only (recommended)agent-browser snapshot -c # Compact outputagent-browser snapshot -d 3 # Limit depth to 3agent-browser snapshot -s "#main" # Scope to CSS selector
Interactions (use @refs from snapshot)
bash
agent-browser click @e1 # Clickagent-browser dblclick @e1 # Double-clickagent-browser focus @e1 # Focus elementagent-browser fill @e2 "text" # Clear and typeagent-browser type @e2 "text" # Type without clearingagent-browser press Enter # Press keyagent-browser press Control+a # Key combinationagent-browser keydown Shift # Hold key downagent-browser keyup Shift # Release keyagent-browser hover @e1 # Hoveragent-browser check @e1 # Check checkboxagent-browser uncheck @e1 # Uncheck checkboxagent-browser select @e1 "value" # Select dropdownagent-browser scroll down 500 # Scroll pageagent-browser scrollintoview @e1 # Scroll element into viewagent-browser drag @e1 @e2 # Drag and dropagent-browser upload @e1 file.pdf # Upload files
Get information
bash
agent-browser get text @e1 # Get element textagent-browser get html @e1 # Get innerHTMLagent-browser get value @e1 # Get input valueagent-browser get attr @e1 href # Get attributeagent-browser get title # Get page titleagent-browser get url # Get current URLagent-browser get count ".item" # Count matching elementsagent-browser get box @e1 # Get bounding box
Check state
bash
agent-browser is visible @e1 # Check if visibleagent-browser is enabled @e1 # Check if enabledagent-browser is checked @e1 # Check if checked
Screenshots & PDF
bash
agent-browser screenshot # Screenshot to stdoutagent-browser screenshot path.png # Save to fileagent-browser screenshot --full # Full pageagent-browser pdf output.pdf # Save as PDF
Video recording
bash
agent-browser record start ./demo.webm # Start recording (uses current URL + state)agent-browser click @e1 # Perform actionsagent-browser record stop # Stop and save videoagent-browser record restart ./take2.webm # Stop current + start new recording
Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it automatically returns to your current page. For smooth demos, explore first, then start recording.
Wait
bash
agent-browser wait @e1 # Wait for elementagent-browser wait 2000 # Wait millisecondsagent-browser wait --text "Success" # Wait for textagent-browser wait --url "**/dashboard" # Wait for URL patternagent-browser wait --load networkidle # Wait for network idleagent-browser wait --fn "window.ready" # Wait for JS condition
Mouse control
bash
agent-browser mouse move 100 200 # Move mouseagent-browser mouse down left # Press buttonagent-browser mouse up left # Release buttonagent-browser mouse wheel 100 # Scroll wheel
Semantic locators (alternative to refs)
bash
agent-browser find role button click --name "Submit"agent-browser find text "Sign In" clickagent-browser find label "Email" fill "user@test.com"agent-browser find first ".item" clickagent-browser find nth 2 "a" text
Browser settings
bash
agent-browser set viewport 1920 1080 # Set viewport sizeagent-browser set device "iPhone 14" # Emulate deviceagent-browser set geo 37.7749 -122.4194 # Set geolocationagent-browser set offline on # Toggle offline modeagent-browser set headers '{"X-Key":"v"}' # Extra HTTP headersagent-browser set credentials user pass # HTTP basic authagent-browser set media dark # Emulate color scheme
Cookies & Storage
bash
agent-browser cookies # Get all cookiesagent-browser cookies set name value # Set cookieagent-browser cookies clear # Clear cookiesagent-browser storage local # Get all localStorageagent-browser storage local key # Get specific keyagent-browser storage local set k v # Set valueagent-browser storage local clear # Clear all
Network
bash
agent-browser network route <url> # Intercept requestsagent-browser network route <url> --abort # Block requestsagent-browser network route <url> --body '{}' # Mock responseagent-browser network unroute [url] # Remove routesagent-browser network requests # View tracked requestsagent-browser network requests --filter api # Filter requests
Tabs & Windows
bash
agent-browser tab # List tabsagent-browser tab new [url] # New tabagent-browser tab 2 # Switch to tabagent-browser tab close # Close tabagent-browser window new # New window
Frames
bash
agent-browser frame "#iframe" # Switch to iframeagent-browser frame main # Back to main frame
Dialogs
bash
agent-browser dialog accept [text] # Accept dialogagent-browser dialog dismiss # Dismiss dialog
JavaScript
bash
agent-browser eval "document.title" # Run JavaScript
Example: Form submission
bash
agent-browser open https://example.com/formagent-browser snapshot -i# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]agent-browser fill @e1 "user@example.com"agent-browser fill @e2 "password123"agent-browser click @e3agent-browser wait --load networkidleagent-browser snapshot -i # Check result
Example: Authentication with saved state
bash
# Login onceagent-browser open https://app.example.com/loginagent-browser snapshot -iagent-browser fill @e1 "username"agent-browser fill @e2 "password"agent-browser click @e3agent-browser wait --url "**/dashboard"agent-browser state save auth.json# Later sessions: load saved stateagent-browser state load auth.jsonagent-browser open https://app.example.com/dashboard
Sessions (parallel browsers)
bash
agent-browser --session test1 open site-a.comagent-browser --session test2 open site-b.comagent-browser session list
JSON output (for parsing)
Add --json for machine-readable output:
bash
agent-browser snapshot -i --jsonagent-browser get text @e1 --json
Debugging
bash
agent-browser open example.com --headed # Show browser windowagent-browser console # View console messagesagent-browser errors # View page errorsagent-browser record start ./debug.webm # Record from current pageagent-browser record stop # Save recordingagent-browser open example.com --headed # Show browser windowagent-browser --cdp 9222 snapshot # Connect via CDPagent-browser console # View console messagesagent-browser console --clear # Clear consoleagent-browser errors # View page errorsagent-browser errors --clear # Clear errorsagent-browser highlight @e1 # Highlight elementagent-browser trace start # Start recording traceagent-browser trace stop trace.zip # Stop and save trace