p/SPEC.md

# `p` — push jobs to worker

A small Rust CLI utility to push command-line jobs to remote worker machines,
with directory sync, job management, and log streaming.

## Motivation

The common developer workflow of "run this build/test/script on a more powerful
remote machine" currently requires manually chaining `rsync` and `ssh` with
a way to keep the job alive in the background (e.g. `nohup`, `tmux`).
`p` wraps that entire flow into a single ergonomic command, while adding proper
job tracking and log capture.

## Core Concepts

### Worker
A remote machine accessible via SSH. Workers are registered locally with a
name and a connection string. One worker is designated as the **default**.

### Job
A command submitted to a worker, along with a (optionally synced) working
directory. Each job has:
- A UUID
- The command run
- The worker it ran on
- The original client CWD
- Start time, end time, exit code
- Captured output log

## CLI Reference

### Running jobs

```
p -- <command>
```
Sync the current directory to the default worker and run `<command>` on it.
Streams the job's output directly to the terminal (like `tail -f`). This feels
like running the command locally.

- `Ctrl+C` detaches from the log stream — the job keeps running on the worker.
  Use `p logs -f <job-id>` to resume watching.
- Use `p stop <job-id>` to kill a running job.
- If the network connection drops, the job keeps running on the worker.
  Use `p logs -f <job-id>` to resume watching.

When the job finishes, `p` prints the exit code and exits:
```
[Job done: exit 0]
```

```
p <worker> -- <command>
```
Same, but targets a specific named worker.

```
p [-n | --no-sync] -- <command>
```
Run `<command>` on the worker without syncing the current directory first.
Useful for commands that need no local files.

```
p [-d | --detach] -- <command>
```
Run `<command>` and immediately detach — do not stream output to the terminal.
The job starts on the worker and `p` prints the job ID. Useful for fire-and-forget
jobs. Use `p logs -f <job-id>` to watch later.

### Job management

```
p ls
```
List **running jobs** across all workers. Pass `-a` / `--all` to also show
completed jobs (done, failed, stopped).
Shows: ID (short), worker, original CWD, command, status, duration.
Style inspired by `docker ps` / `lxc list`.

```
p logs <job-id>
```
Print the captured output of a job (running or finished). Supports `-f` to
follow a running job's output in real-time. `Ctrl+C` detaches without stopping
the job.

```
p stop <job-id>
```
Kill a running job.

```
p pull <job-id> <remote-path> [<local-dest>]
```
Copy a specific file or directory from a job's work directory back to the
client. Used to retrieve build artifacts.

```
p rm <job-id>
```
Remove a job record and its remote work directory. Refuses to remove a
running job without `--force`.

```
p prune
```
Remove all finished job records (status: done, failed, stopped) and their
remote work directories. Jobs with status `running` or `unknown` are left
untouched. Pass `--force` to also include `unknown` jobs.
Pass `--dry-run` to preview what would be removed without deleting anything.

### Worker management

```
p worker register <connection-string> [-n <name>]
```
Register a worker. The connection string is an SSH target (`user@host`,
`user@host:port`, or an SSH config alias). If `-n` is omitted, the hostname
is used as the name. The first registered worker becomes the default.

```
p worker ls
```
List registered workers with their name and connection string.
Pass `--check` / `-c` to also probe reachability over SSH (slow).

```
p worker rm <name>
```
Unregister a worker. Refuses if the worker has running jobs.

```
p worker default <name>
```
Set the default worker.

---

## Directory Sync

- Uses `rsync` over SSH.
- Respects `.gitignore` by default (via `rsync --filter=':- .gitignore'`).
- `.git/` is **included** — some workflows depend on it (e.g. reading the
  current commit SHA or latest tag).
- Each job gets its own isolated work directory on the worker:
  `~/.p/workdirs/<job-uuid>/`
- No automatic sync-back after job completion. Use `p pull` to retrieve
  specific artifacts.

## Execution Model

No persistent agent daemon is needed. Jobs are launched and managed via
ad-hoc SSH commands:

1. `p -- <command>` syncs the directory, then runs via SSH:
   ```
   nohup sh -c '<command> 2>&1 | tee output.log; echo $? > exitcode' & echo $! > pid
   ```
2. The client streams `output.log` in real-time over a separate SSH connection.
3. `Ctrl+C` closes the SSH stream — the job keeps running.
4. `p stop <job-id>` runs `kill $(cat pid)` over SSH.
5. `p logs -f <job-id>` tails the log file over SSH.
6. `p ls` reads the local job DB and SSH-polls to reconcile state when needed.

> **Worker requirements:** `rsync` must be available on the worker.

---

## Job Status & Tracking

The client maintains a local job database (`~/.local/share/p/jobs/<uuid>.json`).
`p ls` reads from this local store for fast output.

### State reconciliation
When a job is running, the client periodically checks if `exitcode` exists on
the worker. If the client was offline or the connection dropped, the next
`p ls` SSH-polls workers to reconcile state. Jobs with unknown status are
marked accordingly.

### Connection drops during streaming
If the SSH connection drops while `p` is streaming output, `p` exits with an
error message showing the job ID. The job continues running on the worker.
Resume watching with `p logs -f <job-id>`.

## Worker-side Layout

All data lives under `~/.p/` on the worker (no root access required).

```
~/.p/
  jobs/
    <uuid>/
      cmd             # command string
      cwd             # original client CWD (display only)
      worker          # worker name (display only)
      started_at      # unix timestamp
      output.log      # combined stdout+stderr, always captured
      exitcode        # written on completion; absent = still running
      pid             # process ID of the running job

  workdirs/
    <uuid>/           # rsync'd copy of client CWD for this job
```


## Configuration

File: `~/.config/p/config.yaml`

```yaml
default_worker: beefy
workers:
  - name: beefy
    connection: user@192.168.1.50
  - name: cloud
    connection: user@cloud-host.example.com
```

## `p ls` Output (example)

```
ID        WORKER   CWD              COMMAND          STATUS     DURATION
--------  ------   ---------------  ---------------  ---------  --------
a3f2b091  beefy    ~/projects/foo   make             running    0:02:14
7c91d302  beefy    ~/projects/bar   cargo test       done [0]   0:01:03
b004f123  cloud    ~/scripts        ./bench.sh       done [1]   0:00:47
```

## Open Questions

- **Multiple jobs from the same CWD**: each gets its own `workdirs/<uuid>/`,
  so they're fully isolated. This may use significant disk space — `p rm`
  should prompt to clean up.

- **Non-Linux workers**: path conventions may differ on macOS workers. Out of
  scope for now.