The kickoff · UW Seattle · May 29 – June 1
Benchmarks · Sims · Build weekend
WELCOME
This is the kickoff. Over the weekend you'll design an environment, run agents against it on mesocosm, and show what you found.
Team structure is flexible — and the scope is wide open.
TODAY · ~2 HOURS
Then the clock starts and you build for the rest of the weekend.
THREE TRACKS
Any game — an adapter that plays Steam titles, Minesweeper, anything. The best ones make people go "I didn't know a model could play this."
Test long-term planning, reasoning and real expertise — science, chemistry & physics sims; the high-value reasoning data frontier labs pay for.
Build a hyper-realistic version of a real human job — something a company would pay to automate.
Judged across the board on one thing: the wow factor.
04 / 13JUDGING
Per-track execution leads. The rest rewards practical, reusable, well-explained work.
SUBMISSION
Everyone submits through Devpost, as a team.
A demo site is trivial once you have traces — point an LLM at your run output and it'll build the visualization for you. Attach the video as an unlisted YouTube link.

EXAMPLE BENCHMARKS
Straight from the Claude Opus 4.8 system card — coding, computer-use, science, real work. Your environment could be the next one on a list like this.
API CREDITS
Cursor generously gave us credits. We'll distribute them by QR code — about one per team.
Already on Claude Code or a paid AI plan? Please skip a code so others can grab one. Out of credits? Come find us — if we have extra, we'll happily hand more out.
PRIZES & SWAG

Plus food & drinks on us — Red Bull variety packs and snacks at the table.
Track prizes: TODO — being finalized. Stay tuned on Discord.
LOGISTICS · FAQ
DEADLINE
Submissions are open. Register your team on Devpost and start hacking today.
Top 5 from Devpost present 3–5 min at the closing ceremony. Judges pick three winners — one per track — plus one overall winner.
11 / 13WHO WE ARE
Stuck? Jump in the SWECC Discord.
SCAN IN


