Introduction

Features
Motivation
Installation
Quick Start: local run
Running in GCP and in AWS
Explore Further

AgentDesk provides full-featured Desktop environments which can be programatically controlled by AI agents.

Features

Built on AgentD – a runtime daemon which exposes a REST API for interacting with the desktop.
Implements the DeviceBay Protocol.
Provides a CLI and a Python library.
The Desktops can be run locally or in the cloud.

Motivation

Why do we want this? Simple. APIs are not always available and they can be incredibly expensive to use. Agents that can use GUIs with ease have a massive advantage operating mobile phones, desktops and SaaS applications. They can work with it just like a human. GUI navigation makes any program accessible and programmable to an agent, which offers tremendous potential to gather information, automate complex, open ended tasks and control your desktop. Almost all the work in this area is currently focused on helping agents to work in browsers, but many apps aren’t available on the web. That’s why we created AgentDesk. It allows you to run VMs locally and in the cloud, and to control them using a Python SDK and CLI. This gives you a tremendously solid foundation for advanced GUI controlling agents. Check out an example of a complex GUI-based agent here. Read on to learn how to use AgentDesk.

Installation

pip install agentdesk If you run local VMs, you need Docker to run the containers with Desktop GUI. You also need QEMU if you are creating QEMU desktops instead of Docker desktops.

Quick Start: local run

from agentdesk import Desktop

# Create a local VM
desktop = Desktop.local()

# Launch the UI for it
desktop.view(background=True)

# Open a browser to Google
desktop.open_url("https://google.com")

# Take actions on the desktop
desktop.move_mouse(500, 500)
desktop.click()
img = desktop.take_screenshot()

Running in GCP and in AWS

desktop = Desktop.gce()

desktop = Desktop.aws()

Explore Further

Simple Example

Playing a simple browser game

Advanced Example

Using GPT-4V to nagivate through UI

CLI Documentation

Find out how to use AgentDesk via CLI

API Reference

Find out how to use AgentDesk Python library

GitHub Drawing Toy

⌘I

Getting Started

Configuration

Package: SurfKit

Package: Taskara

Package: AgentDesk

Package: AgentD

Package: ToolFuse

Package: ThreadMem

Package: MLLM

Features

Motivation

Installation

Quick Start: local run

Running in GCP and in AWS

Explore Further

Simple Example

Advanced Example

CLI Documentation

API Reference

Getting Started

Configuration

Package: SurfKit

Package: Taskara

Package: AgentDesk

Package: AgentD

Package: ToolFuse

Package: ThreadMem

Package: MLLM

​Features

​Motivation

​Installation

​Quick Start: local run

​Running in GCP and in AWS

​Explore Further

Simple Example

Advanced Example

CLI Documentation

API Reference

Features

Motivation

Installation

Quick Start: local run

Running in GCP and in AWS

Explore Further