Docs/State Machine

Operational State Machine

How we prevent command conflicts and ensure safe concurrent operations across multiple administrators.

Version 1.0
~15 min read

Problem Statement

EyeLog agents receive commands from multiple administrators through the collector. Without proper state management:

  • Conflicting operations - Admin A deploys an app, Admin B sends restart → corrupted deployment
  • Race conditions - Two deploys sent simultaneously → undefined behavior
  • No visibility - Collector doesn't know what agent is doing → can't prevent conflicts
  • No accountability - Who requested what operation?

Solution

A single source of truth for agent operational state that:

  • Tracks what the agent is currently doing
  • Enforces which commands can be accepted in each state
  • Reports state to collector in real-time
  • Provides audit trail of who requested what

State Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                      AGENT OPERATIONAL STATE MACHINE                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                              ┌─────────┐                                     │
│              ┌──────────────►│  READY  │◄──────────────────┐                 │
│              │               └────┬────┘                   │                 │
│              │                    │                        │                 │
│              │         ┌──────────┼──────────┬─────────────┤                 │
│              │         │          │          │             │                 │
│              │         ▼          ▼          ▼             ▼                 │
│              │    ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────────┐           │
│              │    │DEPLOYING│ │UPDATING│ │EXEC_EX │ │MAINTENANCE │           │
│              │    │         │ │        │ │        │ │            │           │
│              │    └────┬────┘ └───┬────┘ └───┬────┘ └─────┬──────┘           │
│              │         │          │          │            │                  │
│              │         │ complete │ complete │ complete   │ complete         │
│              │         │          │          │            │                  │
│              └─────────┴──────────┴──────────┴────────────┘                  │
│                                                                              │
│                                    │                                         │
│                                    │ restart command                         │
│                                    ▼                                         │
│                              ┌───────────┐                                   │
│                              │RESTARTING │ ─────► Process exits & restarts   │
│                              └───────────┘                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

State Definitions

StateDescriptionCan Accept
READY Agent idle, can accept any command All commands
DEPLOYING Installing/deploying software Non-conflicting only
UPDATING Agent self-update in progress Cancel only
EXEC_EXCLUSIVE Long-running exclusive command Cancel only
MAINTENANCE System maintenance mode Exit maintenance only
RESTARTING Agent restarting None (terminal)

Command Conflict Matrix

When a command arrives, the agent checks if it conflicts with the current state:

                          Current State
                 ┌───────┬──────┬──────┬──────┬──────┬───────┐
                 │ READY │ DEPL │ UPDT │ EXEC │ MAINT│ RSTRT │
    ┌────────────┼───────┼──────┼──────┼──────┼──────┼───────┤
    │ Deploy     │   ✓   │  ✗   │  ✗   │  ✗   │  ✗   │   ✗   │
    │ Update     │   ✓   │  ✗   │  ✗   │  ✗   │  ✗   │   ✗   │
Cmd │ Exec       │   ✓   │  ✗   │  ✗   │  ✗   │  ✗   │   ✗   │
    │ Restart    │   ✓   │  ✗   │  ✗   │  ✗   │  ✗   │   ✗   │
    │ Config     │   ✓   │  ✓   │  ✓   │  ✓   │  ✓   │   ✗   │
    │ Query      │   ✓   │  ✓   │  ✓   │  ✓   │  ✓   │   ✗   │
    │ Cancel     │   -   │  ✓   │  ✓   │  ✓   │  ✓   │   ✗   │
    └────────────┴───────┴──────┴──────┴──────┴──────┴───────┘
    
    ✓ = Allowed    ✗ = Rejected    - = N/A

Two-Layer Enforcement

Conflicts are checked at two layers:

LayerLocationPurpose
Collector (Advisory) Before sending Better UX - disable buttons, show warnings
Agent (Authoritative) On receipt Final enforcement - always correct

Key Design: Agent is always authoritative. Even if collector has stale state info, agent will reject conflicting commands.

State Reporting

Agent state is reported to collector through two mechanisms:

  • Heartbeat - Includes current state, updated every 30 seconds
  • State Change Event - Sent immediately when state changes
message Heartbeat {
    string agent_id = 1;
    Timestamp timestamp = 2;
    OperationalState operational_state = 3;  // Current state
    // ... other fields
}

message StateChangeEvent {
    string agent_id = 1;
    OpState previous_state = 2;
    OpState new_state = 3;
    string reason = 4;
    string triggered_by = 5;  // Command ID or "internal"
}

Rejection Response

When a command is rejected due to state conflict:

message CommandRejection {
    string command_id = 1;
    string reason = 2;
    OpState current_state = 3;
    ActiveTask blocking_task = 4;  // What's blocking
}

Example Scenario

Timeline:
─────────────────────────────────────────────────────────────────────────

T0: Agent in READY state
    ├── Admin A: "Deploy app v2.0" → ACCEPTED
    └── Agent → DEPLOYING state

T1: Deploying... (30% complete)
    ├── Admin B: "Restart agent" → REJECTED
    │   └── Reason: "Deployment in progress"
    └── Collector shows: "Cannot restart during deployment"

T2: Deploying... (100% complete)
    └── Agent → READY state

T3: Agent in READY state
    └── Admin B: "Restart agent" → ACCEPTED

─────────────────────────────────────────────────────────────────────────
Result: No corrupted deployment. Clear feedback to Admin B.