Meta-World Results. The mean success rates of baselines and VideoAgent on 11 simulated robot manipulation environments from Meta-World. VideoAgent consistently outperforms baselines across all tasks.
| Task | AVDC | AVDC-Replan | VideoAgent | VideoAgent-Online (Iter1) | VideoAgent-Online (Iter2) | VideoAgent-Online-Replan | 
|---|---|---|---|---|---|---|
| Door Open | 30.7% | 72.0% | 40.0% | 41.3% | 44.0% | 80.0% | 
| Door Close | 28.0% | 89.3% | 29.3% | 32.0% | 29.3% | 97.3% | 
| Basketball | 21.3% | 37.3% | 13.3% | 17.3% | 18.7% | 40.0% | 
| Shelf Place | 8.0% | 18.7% | 9.3% | 12.0% | 18.7% | 22.7% | 
| Button Press | 34.7% | 60.0% | 38.7% | 45.3% | 46.7% | 72.0% | 
| Button Press Topdown | 17.3% | 24.0% | 18.7% | 14.7% | 16.0% | 40.0% | 
| Faucet Close | 12.0% | 53.3% | 46.7% | 38.7% | 49.3% | 58.7% | 
| Faucet Open | 17.3% | 24.0% | 12.0% | 13.3% | 21.3% | 36.0% | 
| Handle Press | 41.3% | 81.3% | 36.0% | 36.0% | 44.0% | 85.3% | 
| Hammer | 0.0% | 8.0% | 0.0% | 0.0% | 1.3% | 8.0% | 
| Assembly | 5.3% | 6.7% | 1.3% | 4.0% | 1.3% | 10.7% | 
| Overall | 19.6% | 43.1% | 22.3% | 23.2% | 26.4% | 50.0% |