Features
- Built on AgentD – a runtime daemon which exposes a REST API for interacting with the desktop.
- Implements the DeviceBay Protocol.
- Provides a CLI and a Python library.
- The Desktops can be run locally or in the cloud.
Motivation
Why do we want this? Simple. APIs are not always available and they can be incredibly expensive to use. Agents that can use GUIs with ease have a massive advantage operating mobile phones, desktops and SaaS applications. They can work with it just like a human. GUI navigation makes any program accessible and programmable to an agent, which offers tremendous potential to gather information, automate complex, open ended tasks and control your desktop. Almost all the work in this area is currently focused on helping agents to work in browsers, but many apps aren’t available on the web. That’s why we created AgentDesk. It allows you to run VMs locally and in the cloud, and to control them using a Python SDK and CLI. This gives you a tremendously solid foundation for advanced GUI controlling agents. Check out an example of a complex GUI-based agent here. Read on to learn how to use AgentDesk.Installation
pip install agentdesk
If you run local VMs, you need Docker to run the containers with Desktop GUI.
You also need QEMU if you are creating QEMU desktops instead of Docker desktops.
Quick Start: local run
Running in GCP and in AWS
Explore Further
Simple Example
Playing a simple browser game
Advanced Example
Using GPT-4V to nagivate through UI
CLI Documentation
Find out how to use AgentDesk via CLI
API Reference
Find out how to use AgentDesk Python library

