Problem Statement
EyeLog agents receive commands from multiple administrators through the collector. Without proper state management:
- Conflicting operations - Admin A deploys an app, Admin B sends restart → corrupted deployment
- Race conditions - Two deploys sent simultaneously → undefined behavior
- No visibility - Collector doesn't know what agent is doing → can't prevent conflicts
- No accountability - Who requested what operation?
Solution
A single source of truth for agent operational state that:
- Tracks what the agent is currently doing
- Enforces which commands can be accepted in each state
- Reports state to collector in real-time
- Provides audit trail of who requested what
State Diagram
┌─────────────────────────────────────────────────────────────────────────────┐ │ AGENT OPERATIONAL STATE MACHINE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ │ │ ┌──────────────►│ READY │◄──────────────────┐ │ │ │ └────┬────┘ │ │ │ │ │ │ │ │ │ ┌──────────┼──────────┬─────────────┤ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ │ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────────┐ │ │ │ │DEPLOYING│ │UPDATING│ │EXEC_EX │ │MAINTENANCE │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────┬────┘ └───┬────┘ └───┬────┘ └─────┬──────┘ │ │ │ │ │ │ │ │ │ │ │ complete │ complete │ complete │ complete │ │ │ │ │ │ │ │ │ └─────────┴──────────┴──────────┴────────────┘ │ │ │ │ │ │ │ │ restart command │ │ ▼ │ │ ┌───────────┐ │ │ │RESTARTING │ ─────► Process exits & restarts │ │ └───────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
State Definitions
| State | Description | Can Accept |
|---|---|---|
READY |
Agent idle, can accept any command | All commands |
DEPLOYING |
Installing/deploying software | Non-conflicting only |
UPDATING |
Agent self-update in progress | Cancel only |
EXEC_EXCLUSIVE |
Long-running exclusive command | Cancel only |
MAINTENANCE |
System maintenance mode | Exit maintenance only |
RESTARTING |
Agent restarting | None (terminal) |
Command Conflict Matrix
When a command arrives, the agent checks if it conflicts with the current state:
Current State
┌───────┬──────┬──────┬──────┬──────┬───────┐
│ READY │ DEPL │ UPDT │ EXEC │ MAINT│ RSTRT │
┌────────────┼───────┼──────┼──────┼──────┼──────┼───────┤
│ Deploy │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │ ✗ │
│ Update │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │ ✗ │
Cmd │ Exec │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │ ✗ │
│ Restart │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │ ✗ │
│ Config │ ✓ │ ✓ │ ✓ │ ✓ │ ✓ │ ✗ │
│ Query │ ✓ │ ✓ │ ✓ │ ✓ │ ✓ │ ✗ │
│ Cancel │ - │ ✓ │ ✓ │ ✓ │ ✓ │ ✗ │
└────────────┴───────┴──────┴──────┴──────┴──────┴───────┘
✓ = Allowed ✗ = Rejected - = N/A
Two-Layer Enforcement
Conflicts are checked at two layers:
| Layer | Location | Purpose |
|---|---|---|
| Collector (Advisory) | Before sending | Better UX - disable buttons, show warnings |
| Agent (Authoritative) | On receipt | Final enforcement - always correct |
Key Design: Agent is always authoritative. Even if collector has stale state info, agent will reject conflicting commands.
State Reporting
Agent state is reported to collector through two mechanisms:
- Heartbeat - Includes current state, updated every 30 seconds
- State Change Event - Sent immediately when state changes
message Heartbeat {
string agent_id = 1;
Timestamp timestamp = 2;
OperationalState operational_state = 3; // Current state
// ... other fields
}
message StateChangeEvent {
string agent_id = 1;
OpState previous_state = 2;
OpState new_state = 3;
string reason = 4;
string triggered_by = 5; // Command ID or "internal"
}
Rejection Response
When a command is rejected due to state conflict:
message CommandRejection {
string command_id = 1;
string reason = 2;
OpState current_state = 3;
ActiveTask blocking_task = 4; // What's blocking
}
Example Scenario
Timeline:
─────────────────────────────────────────────────────────────────────────
T0: Agent in READY state
├── Admin A: "Deploy app v2.0" → ACCEPTED
└── Agent → DEPLOYING state
T1: Deploying... (30% complete)
├── Admin B: "Restart agent" → REJECTED
│ └── Reason: "Deployment in progress"
└── Collector shows: "Cannot restart during deployment"
T2: Deploying... (100% complete)
└── Agent → READY state
T3: Agent in READY state
└── Admin B: "Restart agent" → ACCEPTED
─────────────────────────────────────────────────────────────────────────
Result: No corrupted deployment. Clear feedback to Admin B.