swe<<

SWECCATHON 2026

The kickoff · UW Seattle · May 29 – June 1

Benchmarks · Sims · Build weekend

swe<<

WELCOME

Build a benchmark for
literally anything.

This is the kickoff. Over the weekend you'll design an environment, run agents against it on mesocosm, and show what you found.

Team structure is flexible — and the scope is wide open.

Solo – 4 people CS + non-CS Undergrad Anyone can join
02 / 13
swe<<

TODAY · ~2 HOURS

The plan for
this session.

Then the clock starts and you build for the rest of the weekend.

01
Kickoff
how to structure a benchmark app
02
Quick start
from zero to a running env
03
Live demos
two built apps, end to end
04
Designs & samples
example benchmarks we've made
05
The tracks
pick where you compete
03 / 13
swe<<

THREE TRACKS

Pick your arena.

i.

Games

Any game — an adapter that plays Steam titles, Minesweeper, anything. The best ones make people go "I didn't know a model could play this."

e.g. Steam · Minesweeper · board games
ii.

AGI & real-world modeling

Test long-term planning, reasoning and real expertise — science, chemistry & physics sims; the high-value reasoning data frontier labs pay for.

e.g. simulations · long-horizon reasoning
iii.

Future of work

Build a hyper-realistic version of a real human job — something a company would pay to automate.

e.g. secretary-bench · PM-bench

Judged across the board on one thing: the wow factor.

04 / 13
swe<<

JUDGING

How we score it.

Per-track execution leads. The rest rewards practical, reusable, well-explained work.

Track execution & wow~40%how well you nail the track's concept
Usefulness & gap30%how practical — test cases & grading logic in your code
Data & reusability20%can others reuse your benchmark & traces?
Presentation clarity10%how clearly you explain it to us
05 / 13
swe<<

SUBMISSION

What you ship.

  • i. A GitHub repo with your benchmark
  • ii. A UI that visualizes your traces (or the thing you simulate)
  • iii. A 1–2 min demo video — voiceover, no face needed

Everyone submits through Devpost, as a team.

A demo site is trivial once you have traces — point an LLM at your run output and it'll build the visualization for you. Attach the video as an unlisted YouTube link.

Devpost QR
sweccathon-2026.devpost.com
register your team & submit here
On Devpost 1 create your team 2 add your GitHub repo 3 add your demo site 4 attach your video
06 / 13
swe<<

EXAMPLE BENCHMARKS

What the frontier measures.

Straight from the Claude Opus 4.8 system card — coding, computer-use, science, real work. Your environment could be the next one on a list like this.

88.6%
SWE-bench Verified
Resolve real GitHub issues, end to end.
74.6%
Terminal-Bench 2.1
Drive a real terminal to finish tasks.
93.6%
GPQA Diamond
PhD-level science reasoning.
84%
Online-Mind2Web
Complete real tasks across live websites.
1890 Elo
GDPval-AA
Economically valuable, real-world knowledge work.
82.2%
MCP-Atlas
Use external tools through MCP servers.
07 / 13
swe<<

API CREDITS

Cursor

Powered by Cursor.

Cursor generously gave us credits. We'll distribute them by QR code — about one per team.

60
credits to give out

Already on Claude Code or a paid AI plan? Please skip a code so others can grab one. Out of credits? Come find us — if we have extra, we'll happily hand more out.

08 / 13
swe<<

PRIZES & SWAG

Google

Sponsored by Google.

Plus food & drinks on us — Red Bull variety packs and snacks at the table.

  • 15 water bottles
  • 15 tumblers
  • 15 hats
  • Stickers — grab them now

Track prizes: TODO — being finalized. Stay tuned on Discord.

09 / 13
swe<<

LOGISTICS · FAQ

Open questions.

Who pays for platform API credits?
Mesocosm covers benchmark runs — but each run costs us. Start with cheaper / local models (DeepSeek, Ollama, locally-hosted) before spending on the expensive ones.
Where are the docs?
Everything's in the SWECC wiki and the mesocosm CLI — more than enough to get started. wiki.swecc.org/Sweccathon
How do teams form?
Solo or up to four. Use the #team-formation channel in the SWECC Discord.
10 / 13
swe<<

DEADLINE

You can start now.

Submissions are open. Register your team on Devpost and start hacking today.

4:00 PM
submit by — Monday, June 1
5:00 PM
judging begins — in person

Top 5 from Devpost present 3–5 min at the closing ceremony. Judges pick three winners — one per track — plus one overall winner.

11 / 13
swe<<

WHO WE ARE

Your crew.

Simon
Lead & organizer · builds mesocosm
Navneeth
Engineer · mesocosm & food
Derek
Engineer · tech support
Advay
SWECC President · event & closing

Stuck? Jump in the SWECC Discord.

  • #team-formation — find teammates
  • #announcements — event updates
  • #event-questions — ask anything
  • #tech-support — Simon, Derek & Navneeth are watching
12 / 13
swe<<

SCAN IN

Let's build.

Devpost QR
Devpost
sweccathon-2026.devpost.com
Wiki QR
Wiki & docs
wiki.swecc.org/Sweccathon
Discord QR
Discord
discord.gg/VjWAHf9U7
13 / 13