diff --git a/SPEC.md b/SPEC.md index a7a747f..667929b 100644 --- a/SPEC.md +++ b/SPEC.md @@ -1,14 +1,15 @@ # `p` — push jobs to worker A small Rust CLI utility to push command-line jobs to remote worker machines, -with directory sync, job management, and attach/detach support. +with directory sync, job management, and log streaming. ## Motivation The common developer workflow of "run this build/test/script on a more powerful -remote machine" currently requires manually chaining `rsync`, `ssh`, and `tmux`. +remote machine" currently requires manually chaining `rsync` and `ssh` with +a way to keep the job alive in the background (e.g. `nohup`, `tmux`). `p` wraps that entire flow into a single ergonomic command, while adding proper -job tracking, log capture, and attach/detach mechanics. +job tracking and log capture. ## Core Concepts @@ -26,13 +27,6 @@ directory. Each job has: - Start time, end time, exit code - Captured output log -### p-agent -A small Rust (?) binary, automatically uploaded and started by `p` on first use -of a worker. It manages job lifecycle, log capture, and status tracking on -the worker side. The user never manually installs or configures it. -`p` checks the agent version on each connection and re-uploads if outdated. -Communication happens over SSH port forwarding — no extra open ports needed. - ## CLI Reference ### Running jobs @@ -41,14 +35,19 @@ Communication happens over SSH port forwarding — no extra open ports needed. p -- ``` Sync the current directory to the default worker and run `` on it. -Attaches to the job's tmux session immediately. `Ctrl+B D` detaches without -killing the job. `Ctrl+C` sends SIGINT to the running process (standard behavior). +Streams the job's output directly to the terminal (like `tail -f`). This feels +like running the command locally. -When the job finishes, the session stays open and displays: +- `Ctrl+C` detaches from the log stream — the job keeps running on the worker. + Use `p logs -f ` to resume watching. +- Use `p stop ` to kill a running job. +- If the network connection drops, the job keeps running on the worker. + Use `p logs -f ` to resume watching. + +When the job finishes, `p` prints the exit code and exits: ``` ---- Job done [exit 0]. Press any key to detach. --- +[Job done: exit 0] ``` -This lets the user read final output before returning to their shell. ``` p -- @@ -59,7 +58,14 @@ Same, but targets a specific named worker. p [-n | --no-sync] -- ``` Run `` on the worker without syncing the current directory first. -Useful for commands that need no local files (e.g. `p -n -- htop`). +Useful for commands that need no local files. + +``` +p [-d | --detach] -- +``` +Run `` and immediately detach — do not stream output to the terminal. +The job starts on the worker and `p` prints the job ID. Useful for fire-and-forget +jobs. Use `p logs -f ` to watch later. ### Job management @@ -71,19 +77,12 @@ completed jobs (done, failed, stopped). Shows: ID (short), worker, original CWD, command, status, duration. Style inspired by `docker ps` / `lxc list`. -``` -p attach -``` -Re-attach to the tmux session of a running job. Supports partial IDs. -Behaves identically to the initial attach: `Ctrl+B D` detaches, and if the job -has already finished the "press any key" screen is shown. -Only works on **running** jobs. For finished jobs, use `p logs`. - ``` p logs ``` Print the captured output of a job (running or finished). Supports `-f` to -follow a running job's output without attaching to its TTY. +follow a running job's output in real-time. `Ctrl+C` detaches without stopping +the job. ``` p stop @@ -148,62 +147,40 @@ Set the default worker. - No automatic sync-back after job completion. Use `p pull` to retrieve specific artifacts. -## Attach / Detach Mechanics +## Execution Model -Jobs run inside a `tmux` session on the worker. `p` attaches to the session -immediately after starting the job. +No persistent agent daemon is needed. Jobs are launched and managed via +ad-hoc SSH commands: -### Status bar -The tmux session has a custom status bar showing: -``` - p- beefy make [running] 0:02:14 -``` -Fields: job short-ID, worker name, command (truncated), status, elapsed time. +1. `p -- ` syncs the directory, then runs via SSH: + ``` + nohup sh -c ' 2>&1 | tee output.log; echo $? > exitcode' & echo $! > pid + ``` +2. The client streams `output.log` in real-time over a separate SSH connection. +3. `Ctrl+C` closes the SSH stream — the job keeps running. +4. `p stop ` runs `kill $(cat pid)` over SSH. +5. `p logs -f ` tails the log file over SSH. +6. `p ls` reads the local job DB and SSH-polls to reconcile state when needed. -### Key bindings while attached -| Key | Effect | -|---|---| -| `Ctrl+B D` | Detach from session. Job keeps running. | -| `Ctrl+C` | Sends SIGINT to the foreground process (standard terminal behavior). | - -### On job completion -When the job's process exits, `run.sh` writes the exit code and then displays: -``` ---- Job done [exit 0]. Press any key to detach. --- -``` -The tmux session stays open (`remain-on-exit on` for the window) so the user -can scroll through final output. Pressing any key detaches the client and -returns to the local shell. `p` then reads the exit code and prints a summary. - -### `p attach` on a finished job -If the job has already finished and the tmux session is still open (user has -not yet pressed a key), `p attach` reconnects to the "press any key" screen. -Once the key is pressed, the session closes. For a fully-closed session, use -`p logs` instead. - -> **Worker requirements:** `tmux` and `rsync` must be available on the worker -> (standard on most Linux systems). The `p-agent` binary is auto-uploaded by `p`. +> **Worker requirements:** `rsync` must be available on the worker. --- -## Job Status & Notification +## Job Status & Tracking -The **p-agent** runs as a lightweight background process on the worker -(started automatically, not a system service). It: +The client maintains a local job database (`~/.local/share/p/jobs/.json`). +`p ls` reads from this local store for fast output. -- Manages job launch and tmux session creation -- Tees output to `output.log` -- Writes `exitcode` on completion -- Notifies the client over the SSH reverse tunnel when a job finishes +### State reconciliation +When a job is running, the client periodically checks if `exitcode` exists on +the worker. If the client was offline or the connection dropped, the next +`p ls` SSH-polls workers to reconcile state. Jobs with unknown status are +marked accordingly. -The client maintains a local job database (`~/.local/share/p/jobs/.json`) -mirroring job state. `p ls` reads from this local store (fast, no SSH), -updated in real time while attached, and via agent notifications otherwise. - -### Degraded mode (agent unreachable / client was offline) -If the client missed a completion notification, `p ls` marks affected jobs as -`unknown`. The next `p ls` SSH-polls all workers with known-running jobs to -reconcile state. +### Connection drops during streaming +If the SSH connection drops while `p` is streaming output, `p` exits with an +error message showing the job ID. The job continues running on the worker. +Resume watching with `p logs -f `. ## Worker-side Layout @@ -211,8 +188,6 @@ All data lives under `~/.p/` on the worker (no root access required). ``` ~/.p/ - bin/ - p-agent # auto-uploaded by p, versioned jobs/ / cmd # command string @@ -221,7 +196,8 @@ All data lives under `~/.p/` on the worker (no root access required). started_at # unix timestamp output.log # combined stdout+stderr, always captured exitcode # written on completion; absent = still running - tmux_session # tmux session name (e.g. "p-") + pid # process ID of the running job + workdirs/ / # rsync'd copy of client CWD for this job ``` @@ -252,23 +228,9 @@ b004f123 cloud ~/scripts ./bench.sh done [1] 0:00:47 ## Open Questions -- **Worker arch detection**: `p-agent` must be compiled for the worker's - architecture. Options: (a) ship common targets and detect via SSH, (b) - compile on the worker if a Rust toolchain is present, (c) require user to - specify arch in worker config. - - Maybe we could also implement the agent core in the form of a shell script? - At least the entry point, which could do some detection and install or something. - We will see on implementation what works... Small rust binary seems nice, - but we want support for amd64, aarch64 and riscv64. - - **Multiple jobs from the same CWD**: each gets its own `workdirs//`, so they're fully isolated. This may use significant disk space — `p rm` should prompt to clean up. -- **Non-Linux workers**: tmux availability and path conventions may differ on - macOS workers. Out of scope for now. - -- **Ctrl+C → detach** (future): it would be nicer if Ctrl+C detached the - session instead of sending SIGINT to the job, matching the spirit of the - tool. This requires per-session tmux key table configuration and is deferred. +- **Non-Linux workers**: path conventions may differ on macOS workers. Out of + scope for now.