Diffusion-forcing world model rollouts. Each sample is ~2.7 s (16 raw frames @ 6 fps).
Side-by-side videos show ground truth on the left , model prediction on the right .
First half of each clip = history (GT), second half = autoregressively sampled future.
Click a sample header to expand its videos (default-collapsed to keep the page light).
Click a run name to jump to its rollout videos. ✓ = feature on, ✗ = off. Lower val_loss is better.
Latent-space MSE of predicted future vs ground truth. Contact/no-contact split uses tactile latent-to-reference energy (threshold 0.05).
run view MSE TL MSE TR MSE tactile contact MSE tactile no-contact MSE vo_v0 legacy 1v0t val0.0171 0.037249 2.935971 2.813473 2.874722 n/a mm_v0 legacy 1v2t vis0.0162 0.034478 0.007900 0.004601 0.006251 n/a vo_left vis0.0096 0.024259 2.977134 2.804831 2.890982 n/a vo_middle vis0.0128 0.034927 2.977134 2.804831 2.890982 n/a vo_right vis0.0093 BEST single 0.024071 2.977134 2.804831 2.890982 n/a p3_mv2t 3v+2t shared vis0.0100 tac0.0040 0.022032 0.004500 0.008115 0.006308 n/a p4_mv2t_extr no-gate vis0.0102 tac0.32 BLOWUP 0.102037 0.013664 0.013254 0.013459 n/a p4_gate cam-pose + gate vis0.0096 tac0.262 0.070823 0.004860 0.008164 0.006512 n/a p5_gate vision-only + cam-pose vis0.0095 0.039518 0.004821 0.008329 0.006575 n/a MV1 3v2t + delta-ref + shift16 val0.0132 NEW 0.018080 0.005240 0.022181 0.013711 n/a MV2 MV1 + cam-pose val0.0133 NEW 0.041976 0.006076 0.013082 0.009579 n/a
Conclusions (markdown) # Loss curve analysis — multi-view + cam-pose ablations
| Run | Type | Best val_loss_visual | Best val_loss_tactile | Final val_loss |
|----------------------|-------------------------------------------------|----------------------|-----------------------|----------------|
| vo_left | single view (left) | 0.0096 @ ep84 | — | 0.0097 |
| vo_middle | single view (middle/top) | 0.0128 @ ep97 | — | 0.0128 |
| **vo_right** | single view (right) | **0.0093 @ ep97** | — | 0.0093 |
| p2_mv | 3-view shared weights, no tactile | 0.0099 @ ep96 | — | 0.0099 |
| p3_mv2t | 3-view + 2-tactile, shared weights | 0.0100 @ ep97 | 0.0040 | 0.0139 |
| p4_mv2t_extr (no gate) | p3 + cam-pose extrinsics | 0.0102 @ ep98 | **0.3244 (blow-up)** | 0.3346 |
| **p4_gate** | p4 + scalar gate (this fix) | **0.0096 @ ep134** | 0.2621 (still bad) | 0.2751 |
| **p5_gate** | vision-only p2 + cam-pose + gate | **0.0095 @ ep149** | — | 0.0095 |
| vo_v0 (legacy) | single view (old 1v0t pipeline) | 0.0167 @ ep53 | — | 0.0171 |
| mm_v0 (legacy) | 1-view + 2-tactile channel-stack (old mm_v0) | 0.0162 @ ep56 | 0.0032 | 0.0198 |
## Conclusions
1. **The gate fix worked for the VISUAL branch.** Both `p4_gate` (0.0096) and `p5_gate` (0.0095) now **match the best single-view baseline `vo_right` (0.0093)** — for the first time, multi-view + cam-pose conditioning beats the best single-view model on visual fidelity. Without the gate (`p4_mv2t_extr`) the visual stream was already OK (0.0102) but tactile blew up.
2. **The gate did NOT save the tactile branch.** `p4_gate` val_loss_tactile = 0.262 (vs `p3_mv2t` baseline 0.0040, ~65× worse). The single scalar gate delays the cam-pose perturbation by one step but does not prevent it from injecting noise into the tactile token stream once it opens. Likely fix for a future run: route cam-pose ONLY through view tokens (gate visual cam-pose embedding, but never add it to tactile-stream AdaLN).
3. **Plateau.** `p4_gate` and `p5_gate` reach their best around ep122–150 and have plateaued for >1h; safe to inference now even though target was 200 epochs.
4. **Multi-view rank** (visual val_loss, lower=better):
`p5_gate (0.0095) ≈ p4_gate (0.0096) ≈ vo_right (0.0093) < vo_left (0.0096) < p2_mv (0.0099) ≈ p3_mv2t (0.0100) < vo_middle (0.0128)`
5. **`vo_middle` is the worst single view** by a margin — top-down camera has the least useful geometry for predicting future frames.
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_000 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_001 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_002 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_003 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_004 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_005 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_006 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_007 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_008 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout
sample_009 (click to expand videos) view_left — GT | rollout view_middle (top-down) — GT | rollout view_right — GT | rollout tactile_left — GT | rollout tactile_right — GT | rollout