# Fully-Async Rollout Example

End-to-end demo of vime's fully-async rollout path. A background asyncio
worker keeps a fixed pool of in-flight generations across rollout boundaries,
so the next training step doesn't wait for the slowest in-flight sample.
The worker itself lives in `vime.rollout.fully_async_rollout`; this
directory is just the launch script + CI test.

## Files

* `run-qwen2.5-0.5B-fully_async.sh` — single-node, 4-GPU, three-rollout demo
  with Qwen2.5-0.5B-Instruct on dapo-math-17k. Fast enough to be the CI
  smoke test for the fully-async path.

The same script doubles as `tests/test_qwen2.5_0.5B_fully_async_short.py` in
CI.

## Prerequisites

```
/root/models/Qwen2.5-0.5B-Instruct/            # HF checkpoint
/root/models/Qwen2.5-0.5B-Instruct_torch_dist/ # tools/convert_hf_to_torch_dist.py
/root/datasets/dapo-math-17k/dapo-math-17k.jsonl
```

## Run

```bash
cd vime
bash examples/fully_async/run-qwen2.5-0.5B-fully_async.sh
```

You should see:

```
fully-async rollout 0: target=8 queue_warm=0
fully-async rollout 0: done in ...s, queue_left=...
```

## How To Plug Your Own Generate Into This

Two pieces flip the standard pipeline into fully-async:

1. Use the async training driver: `python3 train_async.py` (not `train.py`).
2. Set the rollout function path:
   ```
   --rollout-function-path vime.rollout.fully_async_rollout.generate_rollout_fully_async
   ```

For custom per-sample logic, use vime's standard plug-in points — they
work unchanged under fully-async:

```
--custom-generate-function-path your.module.generate     # (args, sample, sampling_params) -> Sample | list[Sample]
--custom-rm-path                your.module.reward      # (args, sample | list[Sample]) -> float | list[float]
```

See `examples/coding_agent_rl/` for a non-trivial example that plugs in a
multi-turn agent this way.

## Worker Internals (Very Short)

* First call: create a process-wide `AsyncRolloutWorker` (thread + asyncio
  loop). The worker is shared across all subsequent `generate_rollout`
  calls so its queue stays warm.
* Loop keeps up to `args.vllm_server_concurrency` tasks in flight using
  `generate_and_rm_group`.
* Completed groups land on an output queue; each `generate_rollout` call
  drains until it has `rollout_batch_size` groups and returns them sorted
  by `sample.index`.
* Groups containing an `ABORTED` sample are pushed back into
  `data_buffer.add_samples` instead of being shipped to training.
* Worker is stopped automatically at process exit via `atexit`.

## Limitations

* No evaluation mode (would conflict with the continuous-running model).
* Ordering across rollouts is best-effort — within a rollout, groups are
  sorted by index before being handed to training.
* TODO: partial-rollout-style resume for `ABORTED` trajectories is not
  yet wired; for now the trajectory is re-queued and starts over.