Personal project

PolyDebate AI Debate Platform

Full-Stack & Deployment · Nov 2025 · github.com/bazarkua/polydebate

Next.js, Flask, Postgres, Railway, Vercel, OpenRouter, ElevenLabs, Polymarket

Screenshot of a live debate on PolyDebate. The market is 'Elon Musk number of tweets November 11 to November 18, 2025'. Four large model orbs sit across the top in different colors (Mistral 7B, Qwen2.5 VL, OpenAI gpt-oss, Nous Hermes 4), labeled 'Round 3 of 3' with an 'AI models are debating' indicator. Below each orb, the model's argument is shown with a numeric prediction breakdown across tweet-count buckets. — Figure 1 | A live PolyDebate session on a Polymarket prediction market. Four large language models argue concurrently, each producing a probability estimate over the market's outcomes. The orbs animate as the models stream their responses; arguments and per-bucket predictions are rendered as each model's text arrives over Server-Sent Events.

PolyDebate is an AI-driven debate platform where multiple large language models (Claude, GPT-4, Gemini, and around 100 others through OpenRouter) argue the bull, bear, and adjacent positions on a live Polymarket prediction market in real time. Each model produces its own probability estimate over the market's outcomes, and each argument is narrated through ElevenLabs text-to-speech and streamed to the browser as the model responds. The user picks the market, picks how many models join, and watches the debate unfold round by round.

I built it with one teammate over the QuackHacks 2.0 weekend (University of Oregon, November 2025), where it placed third in the Polymarket track. My teammate owned most of the application code (the Next.js front end, the Flask debate orchestrator, the OpenRouter, ElevenLabs, and Polymarket service clients, plus authentication and email-code verification). My half was the wiring between the two tiers, the deployment plumbing on both ends, the Postgres support, and a series of feature fixes that touched both the front and back.

Why this is interesting

A prediction market is usually shown as a single price: 53 percent yes, 47 percent no. The price is a useful summary, but it hides the reasoning. PolyDebate has a hundred-plus models argue the cases that produced that price, with the bull and bear positions split across different models so the debate is not just one assistant talking to itself. Each model stakes its own probability, the system aggregates them at the end, and the user sees both the arguments and the disagreement.

The technical interest, from my side, is what it takes to ship that across a hackathon weekend with two developers. The application code is one half of the story; the other half is two cloud platforms, two language runtimes, a managed database, and a deployment that does not fall over the moment the demo starts.

The cross-tier integration

Screenshot of the debate results page after a debate completes. A summary block at the top describes the consensus ('all models converged on the 220 to 259 range'). Below, a Final Predictions section lists each model's distribution across tweet-count buckets (mistralai/mistral-7b-instruct-v0.2, nousresearch/hermes-4-70b, openai/gpt-oss-120b, qwen/qwen2.5-vl-72b-instruct) with per-bucket percentages and shift indicators relative to the model's pre-debate prediction. Below that, a Model Rationales section starts. — Figure 2 | The post-debate results page. Once all rounds finish, the backend aggregates per-model predictions into a final distribution, computes how each model shifted between rounds, and writes a consensus summary. The frontend re-fetches the debate row through the regular REST endpoint to render this view, after the SSE stream has closed.

The application is split across two hosts on purpose. The Next.js front end runs on Vercel and the Flask backend runs on Railway with a managed Postgres alongside it. Two hosts means two deployment paths, two sets of build settings, two sets of environment variables, and two sources of truth for what the production URLs are. The integration layer is what makes that split feel like one application from the browser's perspective.

The seam is a single environment variable on the front end, NEXT_PUBLIC_API_URL, that points at the Railway backend, plus matching CORS allow-list entries on the Flask side. Locally that variable points at http://localhost:5000; in production it points at the Railway-issued domain. The Server-Sent Events stream for live debates lives on the same backend (so the frontend opens an EventSource against the same base URL) and gunicorn's gevent worker class is what holds the long-lived SSE connection open per debate. After the stream closes, the results page re-fetches the same debate row through the regular REST endpoint to render the post-debate view shown above.

The production rebuild pipeline turned out to be the hard part. Railway's nixpacks builder auto-detects Python projects when it sees a requirements.txt at the repo root, but the canonical dependencies live in backend/requirements.txt. I wrote a sync-requirements.sh script and wired it into the build so the root copy stays mirrored on every commit; the backend folder remains the source of truth, the root copy is generated, and Railway picks up the right Python version through runtime.txt. The Vercel side is much simpler (vercel.json is just {"framework": "nextjs"}), but pinning the framework explicitly avoided a stretch of mistaken framework guesses during the early deployment days.

The Railway deployment marathon

The end-of-December push (around 25 commits in a single day) was the marathon. The starting point was a combined Docker image: a multi-stage Dockerfile that built the Next.js bundle in one stage and ran nginx, supervisord, and gunicorn together inside one Python image, with the Next.js standalone server alongside. It worked locally but Railway's cold starts on the combined image were slow enough that demos sometimes stalled at the introduction screen. I split it into the Vercel-plus-Railway architecture above, which meant porting the Flask backend out of the combined image and into a stand-alone Railway service.

That uncovered a chain of build issues. Postgres support needed adding to the database layer (the original code assumed SQLite); the DATABASE_URL that Railway provides needed parsing through SQLAlchemy correctly; the Procfile had to use the gevent worker class so the SSE endpoint would not block other requests; runtime.txt had to pin Python 3.11 because some upstream wheels were not yet available for 3.12 on Railway's nixpacks; nixpacks.toml had to be added to give the builder explicit hints when auto-detection guessed wrong; and the requirements-sync script described above had to be wired into the build. Each of those was a small change individually, and a stack of them landed on top of each other through one demo-week night.

Cross-tier feature fixes

Once production was stable, the rest of the work was on both sides of the wire. The admin endpoints (a small backend blueprint plus the matching React route, with a working mobile burger-menu layout) shipped together so the dashboard was usable from a phone. The audio-playback timing fix was a frontend correction, traced back to a race between the SSE-emitted audio URL and the player's own readiness check; sequencing them through the debate-stream context made the player consume the URL in the right order. The Gmail SMTP path needed step-by-step debug logging once we discovered that the outbound TLS handshake to Google's relay was sometimes silently dropping connections; the structured logs are what made that a five-minute fix instead of a guessing game. A separate frontend pass picked up two CVE bumps on transitive dependencies, including a Vercel-flagged React Server Components vulnerability that landed through PR-13.

Reading list

bazarkua/polydebate polydebate.com Devpost QuackHacks 2.0 Polymarket OpenRouter