Zetto
AI Language Learning Japanese Voice

Zetto

2026

// Overview

Zetto is a voice-first Japanese learning app built on Next.js 15, running Gemini 2.5 Flash over a persistent WebSocket connection for sub-second conversational responses — no REST round-trips. The learning loop runs three tracks in parallel: fill-in-the-blank exercises, semantic prompts, and open roleplay, with per-word mastery scores stored in Cloudflare KV. The system measures response latency (how fast the learner replies to each AI prompt) — quick replies escalate complexity, slow replies trigger a soft hint without breaking conversation flow. The AI also intentionally plants grammatical errors 5% of the time to test whether the learner catches them. The UI shows one sentence at a time; tapping any word logs it as a struggle point and shows a translation. Hiragana reading aids (furigana) auto-hide the second time a kanji appears in a session, forcing active recall rather than passive recognition. Every Sunday a 3-minute calibration session adjusts the N4/N5 difficulty ratio based on the previous week's performance data.

// Challenges

Sub-second latency required audio to stream directly over a persistent WebSocket to gemini-2.5-flash — REST adds ~300ms of overhead that breaks the conversational rhythm. Measuring how fast a learner responds had to run client-side via AudioWorklet to avoid adding round-trip latency to the measurement itself. The furigana auto-hide feature requires a per-session kanji registry so the UI knows whether it has already shown reading aids for each character without a re-render. Mastery tracking requires low-latency reads and writes to Cloudflare KV on every interaction without stalling the audio thread.

// Outcomes

Currently in active development as a personal study accelerator targeting JLPT N4. The architecture demonstrates that a web app can deliver sub-second AI voice conversation without native infrastructure. The weekly calibration loop means the difficulty curve responds to actual effort, not just raw answer scores.