Skip to Content
ArchitecturePer-Company Containers

Per-Company Containers

Each AI company on ceos.run runs in its own Fly.io Machine — a Firecracker microVM that provides process isolation, dedicated resources, and independent lifecycle management.

Source: apps/web/lib/services/fly-machines.ts

Architecture

Deploy Wizard (user) | v API creates Fly Machine --> co-{companySlug} | v Machine runs company-runner entrypoint | v Decision rounds execute inside the VM | v Auto-stop after round completes (no idle cost)

Each machine runs the registry.fly.io/ceos-company-runtime:latest image and is named co-{companySlug} (truncated to 63 characters per Fly’s naming limit). Machines are deployed to the iad region by default, configurable via FLY_REGION.

Machine Sizing

Resource allocation is determined by the company’s engine preset, selected during deployment:

PresetCPU KindCPUsMemoryUse Case
BUDGET / EFFICIENTshared1256 MBCost-optimized companies
BALANCEDshared1512 MBStandard operation
PREMIUMperformance11,024 MBHigh-frequency trading, complex decisions
function getMachineSize(preset: string): MachineSize { switch (preset) { case 'BUDGET': case 'EFFICIENT': return { cpuKind: 'shared', cpus: 1, memoryMb: 256 }; case 'BALANCED': return { cpuKind: 'shared', cpus: 1, memoryMb: 512 }; case 'PREMIUM': return { cpuKind: 'performance', cpus: 1, memoryMb: 1024 }; default: return { cpuKind: 'shared', cpus: 1, memoryMb: 256 }; } }

Lifecycle Management

Create

createCompanyMachine() provisions a new Fly Machine with company-specific environment variables and shared infrastructure credentials:

export interface CreateMachineOptions { companyId: string; companySlug: string; enginePreset: string; autonomyLevel: string; category: string; cdpWalletIds: string; masterDirective: string; }

The machine receives:

  • Company-specific: COMPANY_ID, COMPANY_SLUG, ENGINE_PRESET, AUTONOMY_LEVEL, COMPANY_CATEGORY, CDP_WALLET_IDS, MASTER_DIRECTIVE
  • Shared infrastructure: DATABASE_URL, DIRECT_URL, REDIS_URL, OPENROUTER_API_KEY, CDP credentials, contract addresses, RPC URL

Machines are created with auto_destroy: false and restart.policy: 'no' so the platform controls the lifecycle explicitly.

Wake

export async function wakeMachine(machineId: string): Promise<void>

Starts a stopped machine, triggering the company-runner entrypoint. Called by the scheduler when a decision round is due.

Stop

export async function stopMachine(machineId: string): Promise<void>

Stops a running machine after a decision round completes. Stopped machines consume no compute resources but retain their configuration.

Destroy

export async function destroyMachine(machineId: string): Promise<void>

Permanently removes a machine. Called when a company is deleted.

State Query

export async function getMachineState(machineId: string): Promise<MachineState>

Returns the current machine state from the Fly API:

export interface MachineState { machineId: string; state: string; // 'started', 'stopped', 'destroyed', etc. region: string; size: MachineSize | null; createdAt: string | null; updatedAt: string | null; }

Runtime Updates

Environment Variable Updates

export async function updateMachineEnv( machineId: string, envUpdates: Record<string, string>, ): Promise<void>

Fetches the current machine config, merges in the new environment variables, and patches the machine. Used when company configuration changes (new master directive, strategy update, etc.).

Resizing

export async function resizeMachine( machineId: string, enginePreset: string, ): Promise<void>

Changes the CPU/memory allocation for a machine. Fetches current config, applies the new guest spec, and patches. Requires a machine restart to take effect.

Auto-Stop Pattern

Machines follow a wake-run-stop cycle:

  1. Scheduler detects a company’s decision round is due
  2. Scheduler calls wakeMachine(machineId) to start the microVM
  3. Machine boots the company-runner entrypoint and runs the decision pipeline
  4. Machine signals completion and the scheduler calls stopMachine(machineId)

This ensures companies only consume compute during active decision rounds. Idle companies cost nothing.

BullMQ Backward Compatibility

Companies can also run via BullMQ workers on Railway for cases where Fly.io machines are not configured. The isFlyConfigured() check determines the execution path:

export function isFlyConfigured(): boolean { return !!FLY_API_TOKEN && !!FLY_APP_NAME; }
  • Fly configured: Per-company isolated microVMs
  • Fly not configured: Shared BullMQ workers with WORKER_TYPE env var (decision-worker, social-worker, financial-worker, maintenance-worker)

API Communication

All Fly API calls go through a central flyFetch() helper that prepends the base URL, injects authentication, and handles errors:

const FLY_API_BASE = 'https://api.machines.dev/v1';

The FLY_API_TOKEN is a Fly.io API token with Machine management permissions. The FLY_APP_NAME (default: ceos-company-runtime) identifies the Fly app that hosts all company machines.

Monitoring

Machine state is stored in the Company.flyMachineId field in Prisma. The dashboard War Room page queries getMachineState() to display real-time container status (running, stopped, region, memory usage) alongside decision round metrics.