Monitor and recover members¶
Members run unattended in tmux panes, so the Director needs two primitives: a cheap roster watch and an escalation ladder for a member that stopped reacting. This guide checks on a running team and recovers a quiet member.
Ensure the monitor is running¶
The recovery ladder below is driven by a periodic supervision tick. That tick
comes from cafleet monitor — a per-fleet loop a coding agent runs as a
background task, waking due agents by keystroking message poll into their
panes. Start it once as a background task, before the team gets busy, and it
pings every enrolled agent on its interval — the Director and members alike,
regardless of whether the inbox has pending items — which is what surfaces a
quiet member in the first place (Monitoring):
cafleet --fleet-id 1 monitor start # run as a background task
cafleet --fleet-id 1 monitor status # confirm it is running + see the schedule
The monitor supplies only the heartbeat; the inspect-and-recover steps below are
the Director's job on each tick. Stop it at teardown by stopping that background
task (there is no monitor stop); fleet delete also makes the loop
self-terminate.
Prompt¶
Check on my CAFleet team in fleet 1. List member activity, find any
member that has gone quiet, inspect its pane, and recover it with the
mildest intervention that works — only delete it as a last resort.
Your agent loads the cafleet skill plus cafleet-agent-team-monitoring /
cafleet-agent-team-supervision (recovery ladder and idle semantics).
What to expect¶
The agent lists the roster with per-member idle times, captures the pane of any member that has gone quiet, and climbs the recovery ladder from mildest to harshest: re-poke the inbox, answer a pending prompt, dispatch a shell command (Bash routing), and only as a last resort delete the member. You see each intervention land as keystrokes in the member's pane (tmux push).
Appendix: the CLI underneath¶
The commands the agent runs, all from the Director's pane, with literal
ids — fleet 1, members 4/5/6; your ids will differ.
Watch the team — last_sent is the member's most recent outgoing message,
last_recv its most recent delivery, last_ack the most recent delivery
it acknowledged, and idle the wall-time since the latest of last_sent /
last_recv; a member that receives work but never sends or acks is stalled
— alice (4) below has been quiet for 14 minutes (aggregation rules in
CLI options):
3 members:
agent_id name status last_sent last_recv last_ack idle
-------------- -------- ------ --------- --------- --------- -----
4 alice active - 12:20:00 12:20:00 14m
5 bob active 12:30:11 12:33:02 12:33:02 2m
6 carol active 12:34:56 12:34:50 12:34:50 6s
Inspect the quiet member — prints the last 30 lines of the pane buffer with
ANSI escapes stripped (--lines N for a longer tail); a stalled member
typically shows a pending prompt:
Ladder rung 1, member ping — injects a cafleet message poll keystroke
so the member drains anything it missed; panes need re-poking at all
because inline previews are best-effort keystrokes
(tmux push):
Ladder rung 2, member send-input — --choice 1..3 answers an
AskUserQuestion option; --freetext "<text>" fills the "Type something"
field:
Ladder rung 3, member exec — keystrokes ! git status into the pane so
the coding agent runs it natively, the dispatch half of the
bash-via-Director protocol (Bash routing):
Ladder rung 4, member delete (last resort) — sends /exit and waits up
to 15 s for the pane to close:
A pane that refuses to close makes the command exit 2 with the pane tail
and a built-in recovery hint; cafleet member delete --member-id 4 --force
skips the wait, kills the pane, and exits 0 even if the pane was already
gone:
Error: pane %7 did not close within 15.0s after /exit.
--- pane %7 tail (last 80 lines) ---
<captured terminal buffer>
---
Recovery: inspect with `cafleet member capture`, answer any prompt with `cafleet member send-input`, then re-run `cafleet member delete`. Or re-run with `--force` to skip the wait and kill the pane.
Every flag, validation rule, and exit code for the member subcommands is documented in CLI options.