Zum Inhalt springen

I Built a Voice‑First AI App in One Weekend — Here’s Everything I Got Right (and Wrong)

TL;DR: I built Learnflow AI, a voice‑first GPT‑4 learning companion, in just one weekend using:

  • Vapi for the full voice loop (speech-to-text → GPT → text-to-speech)
  • Convex for backend logic and credit tracking
  • Kinde for auth, role-based access, and hosted billing onboarding

    This post breaks down what worked, what broke, and the lessons learned — with diagrams, code, and reliability tips.

I gave myself one weekend.

One weekend to build, design, and deploy a real AI product.

The result? Learnflow AI — a voice-first learning tool where users can create tutors, start sessions, and learn by speaking with GPT-backed companions.

Think: „Duolingo meets ChatGPT meets voice notes.“

But it wasn’t all smooth. I rebuilt the onboarding three times. Scrapped one entire flow. Learned the hard way what breaks user trust.

This post is the full breakdown — how I did it, what I got wrong, and what I’d do differently.

The Goal: Build an End-to-End Voice Learning Tool

I wanted to build something useful, impressive, and real — not just another GPT wrapper.

Learnflow AI had a simple premise:

  • Let users create a custom tutor (subject, tone, voice)
  • Let them talk to that tutor in real time
  • Use voice in, voice out — powered by an awesome voice AI tool

It had to:

  • Be production-ready
  • Handle real-time speech
  • Track usage and offer plans

And it had to launch in one weekend.

The Tech Stack (What Actually Worked)

Problem Tool Why I Chose It
Auth + Feature Gating + Billing Kinde Easy social login + billing integration
Database + Backend Convex Realtime reactive backend + clean TypeScript logic
Voice AI Vapi.ai Built for multi-turn GPT conversations
Frontend Framework Next.js App Router Great for routing, loading states, SSR
Styling & Components Shadcn Fast UI dev

Vapi handled everything voice-related — transcription, GPT calls, and TTS. I didn’t need to wire together OpenAI, Whisper, or ElevenLabs separately. One endpoint, one agent.

That saved days.

Flow Overview: What the User Experiences

Full User Flow:

User flow

The Frontend: What I Shipped

  • Dashboard loaded with a blank UI and a “Create Tutor” button
  • A list of public tutors was visible, but no onboarding flow
  • No explanation of credits, plans, or what to do

What Happened:

Users froze. Some clicked around. Most left.

“What is this?”
“Where do I start?”

Minimalism without guidance is abandonment. Lesson learned.

Fix:

  • Added a first-time user check in Convex
  • Triggered a guided builder flow (subject, style, voice)
  • Added a persistent “Start Session” CTA
  • Displayed credits in top-right
if (!user.hasSeenOnboarding) {
  // Show builder modal + progress
  await ctx.db.patch(user._id, { hasSeenOnboarding: true });
}

The Backend: Storing Tutors, Sessions, and Plans

Convex let me move fast.

Convex Schema: Tutors + Users + Sessions

users: defineTable({
  email: v.string(),
  credits: v.optional(v.number()),
  plan: v.optional(v.string()),
  hasSeenOnboarding: v.optional(v.boolean()),
}),

companions: defineTable({    // the schema for tutors
  userId: v.id("users"),
  subject: v.string(),
  style: v.string(),
  voice: v.string()
}),

sessions: defineTable({
  userId: v.id("users"),
  companionId: v.id("companions")
})

Usage Deduction:

const creditCost = user.plan === "pro" ? 0 : 1;
if (user.credits < creditCost) {
  throw new Error("Out of credits. Upgrade to continue.");
}
await ctx.db.patch(user._id, {
  credits: user.credits - creditCost,
});

This powered all limits and nudges.

Kinde: Auth + Plan Sync

Kinde handled auth and billing. It was quick to set up.

Key Flow:

  • On sign-up, users see Kinde’s hosted pricing table
  • They pick between a free or pro plan
  • This plan is accessible in the session:
const { getUser } = getKindeServerSession();
const user = await getUser();
const plan = user?.user_metadata?.plan || "free";

No need to manage Stripe logic — Kinde abstracts it.

Sticky Upgrade Logic

If a user hits a limit or nears 0 credits, they see:

{user.plan === "free" && user.credits <= 2 && (
  <div className="p-4 bg-yellow-100 text-sm">
    You have {user.credits} sessions left. Upgrade now?
    <Link href="/dashboard/upgrade" className="underline ml-2">Upgrade →</Link>
  </div>
)}

This contextual upgrade performed better than static CTAs.

Vapi: Real-Time Voice AI in a Single Agent

When I started building Learnflow AI, I knew I didn’t want to manage transcription, audio streaming, GPT prompting, or TTS pipelines manually.

That’s exactly where Vapi came in.

Instead of stitching together multiple services, I defined a single agent — and Vapi handled the rest:

voice in → transcription → GPT reasoning → voice out.

The Full Flow

With one REST call, I could start a session. Behind the scenes, Vapi:

  1. Captured live audio from the browser
  2. Transcribed in real-time
  3. Passed transcripts to GPT-4 using my defined agent prompt
  4. Streamed back audio responses
  5. Managed call events (start, end, error, speaking, etc.)

It felt like magic — but it was just solid engineering and a well-designed SDK.

How I Integrated It

I wrapped the Vapi SDK in a CompanionComponent that handled all live session logic:

Key Features:

  • Live transcript display
  • Speaking animation via Lottie
  • Session tracking via Convex
  • Mic mute/unmute toggle
  • Accurate state handling (connecting, active, finished)

Vapi Session Lifecycle

Let’s break it down into real steps and show you how it flows:

Vapi session lifecycle

Example Integration: Start Session

Here’s how I kicked off a session with assistant configuration:

const handleCall = async () => {
  setCallStatus(CallStatus.CONNECTING);

  const assistantOverrides = {
    variableValues: { subject, topic, style },
    clientMessages: ["transcript"],
    serverMessages: [],
  };

  vapi.start(configureAssistant(voice, style), assistantOverrides);
};

That configureAssistant() function generates a prompt and voice configuration for the companion.

No need to manage tokens, audio streams, or AI responses — just define the personality, and Vapi handles the loop.

Live Transcript from Vapi Events

vapi.on('message', (message) => {
  if (message.type === 'transcript' && message.transcriptType === 'final') {
    const newMessage = { role: message.role, content: message.transcript };
    setMessages((prev) => [newMessage, ...prev]);
  }
});

Backend Tracking with Convex

Each session is saved to Convex for analytics and credit tracking:

addSession({
  userId: profile?._id as Id<"users">,
  companionId: companionId as Id<"companions">,
});

This makes every session persistent, linkable, and easy to manage from the dashboard or user history.

Session State Management

enum CallStatus {
  INACTIVE = 'INACTIVE',
  CONNECTING = 'CONNECTING',
  ACTIVE = 'ACTIVE',
  FINISHED = 'FINISHED',
}

These states controlled:

  • Button text (Start Session, Connecting, End Session)
  • Mic toggle behavior
  • Speaking animation visibility

UI Touch: Real-Time Visual Feedback

Using Lottie animations, I showed speaking activity when the assistant responded:

<Lottie
  lottieRef={lottieRef}
  animationData={soundwaves}
  autoplay={false}
  className="companion-lottie"
/>

Combined with a live transcript feed and personalized UI, it felt like a real tutor experience — not just a chatbot.

Summary: What Vapi Did For Me

Problem Vapi Solution
Audio capture Handled via SDK
Transcription Real-time, no setup
GPT integration Abstracted in agent config
TTS response Instant playback
Call lifecycle Built-in events
Frontend UX Easily wrapped in React

This allowed me to focus 100% on the product experience.

Other UX Fixes That Helped

  • Button to „Create Tutor“ immediately visible
  • Tutor builder used a 3-step flow with progress
  • Sticky credit banner with real-time updates

Metrics + Learnings (Pre-Launch)

Area What Went Right What Went Wrong
Voice AI Vapi.ai worked fast Concurrency limit of 10 sessions(calls) at a time
Auth Kinde pricing table made onboarding clean Some users skipped plan selection
Usage Convex credit tracking was reliable No reminder on low credits
Onboarding Tutor builder clarified usage No welcome message caused confusion

What I Got Right

  • Used Vapi to avoid infra headaches
  • Kept onboarding tight to a single goal
  • Synced plan + limits into backend
  • Used Convex mutations to gate usage
  • Let users create, explore, and start quickly

What I Got Wrong

  • No onboarding on first launch (fixed)
  • No credit visibility (fixed)
  • Confusing builder UX at first (iterated)
  • Upgrade path too hidden (added sticky CTA)

Future Upgrades

  1. Session Resume: let users return to unfinished conversations
  2. Credit Refill Logic: monthly resets + webhook for Pro plans
  3. Stats Dashboard: show user time spent, sessions run, etc.

Lessons Learned

  1. Don’t overbuild early. My first goal was working audio input/output.
  2. Real billing early. Kinde Billing + Convex saved me 2 weeks of Stripe integration.
  3. Guided onboarding > tooltips. Let users succeed once, then ask for money.
  4. Measure frustration points. I debugged 5 drop-off points by watching user flows.
  5. AI doesn’t mean magic. If users don’t understand the flow, they leave.

Final Thoughts

You don’t need a team of 5 or 4 weeks to build something powerful.

In one weekend, I shipped:

  • Auth + roles
  • Voice-first sessions
  • Custom tutor creation
  • Usage limits and billing

The trick?
Leaning on tools that do the hard stuff — so I could focus on product.

Want to build AI-first products fast?

Use tools like:

  • Next.js App Router
  • Convex (backend + DB)
  • Kinde (auth + billing)
  • Vapi.ai (AI voice essions)
  • Shadcn

If you’re building an AI tool, my advice:

Pick one input, one output, one use case.

Build it fast. Make someone say, „Oh damn, this works.“

That’s what Learnflow AI tries to be.

Want early access? DM me on X/Twitter

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert