Meta x OpenEnv Hackathon (India 2026)

SafeGen Arena

An OpenEnv-compliant RL environment that trains a small Defender LM to safely rewrite image-generation prompts — allow / transform / reject, per prompt, with intent preserved.

Read first

Blog post (full story)

Why this loop works — for judges

README

Architecture, results, repro steps

Code & runs

GitHub repo

Somin-Aggarwal/SafeGen-Arena

Colab training notebook

Judge-runnable end-to-end

WandB run (v4 shipped)

1300 GRPO steps, +0.33 plateau

Shipped LoRA adapter (v4)

17 MB, GRPO-trained

OpenEnv API (live)

GET /health GET /state POST /reset POST /step GET /docs (OpenAPI)