I Built a Voice‑First AI App in One Weekend — Here’s Everything I Got Right (and Wrong) - Tech, Media, Games Blog

TL;DR: I built Learnflow AI, a voice‑first GPT‑4 learning companion, in just one weekend using:

Vapi for the full voice loop (speech-to-text → GPT → text-to-speech)

Convex for backend logic and credit tracking

Kinde for auth, role-based access, and hosted billing onboarding

This post breaks down what worked, what broke, and the lessons learned — with diagrams, code, and reliability tips.

I gave myself one weekend.

One weekend to build, design, and deploy a real AI product.

The result? Learnflow AI — a voice-first learning tool where users can create tutors, start sessions, and learn by speaking with GPT-backed companions.

Think: „Duolingo meets ChatGPT meets voice notes.“

But it wasn’t all smooth. I rebuilt the onboarding three times. Scrapped one entire flow. Learned the hard way what breaks user trust.

This post is the full breakdown — how I did it, what I got wrong, and what I’d do differently.

The Goal: Build an End-to-End Voice Learning Tool

I wanted to build something useful, impressive, and real — not just another GPT wrapper.

Learnflow AI had a simple premise:

Let users create a custom tutor (subject, tone, voice)
Let them talk to that tutor in real time
Use voice in, voice out — powered by an awesome voice AI tool

It had to:

Be production-ready
Handle real-time speech
Track usage and offer plans

And it had to launch in one weekend.

The Tech Stack (What Actually Worked)

Problem	Tool	Why I Chose It
Auth + Feature Gating + Billing	Kinde	Easy social login + billing integration
Database + Backend	Convex	Realtime reactive backend + clean TypeScript logic
Voice AI	Vapi.ai	Built for multi-turn GPT conversations
Frontend Framework	Next.js App Router	Great for routing, loading states, SSR
Styling & Components	Shadcn	Fast UI dev

Vapi handled everything voice-related — transcription, GPT calls, and TTS. I didn’t need to wire together OpenAI, Whisper, or ElevenLabs separately. One endpoint, one agent.

That saved days.

Flow Overview: What the User Experiences

Full User Flow:

The Frontend: What I Shipped

Dashboard loaded with a blank UI and a “Create Tutor” button
A list of public tutors was visible, but no onboarding flow
No explanation of credits, plans, or what to do

What Happened:

Users froze. Some clicked around. Most left.

“What is this?”
“Where do I start?”

Minimalism without guidance is abandonment. Lesson learned.

Fix:

Added a first-time user check in Convex
Triggered a guided builder flow (subject, style, voice)
Added a persistent “Start Session” CTA
Displayed credits in top-right

if (!user.hasSeenOnboarding) {
  // Show builder modal + progress
  await ctx.db.patch(user._id, { hasSeenOnboarding: true });
}

The Backend: Storing Tutors, Sessions, and Plans

Convex let me move fast.

Convex Schema: Tutors + Users + Sessions

users: defineTable({
  email: v.string(),
  credits: v.optional(v.number()),
  plan: v.optional(v.string()),
  hasSeenOnboarding: v.optional(v.boolean()),
}),

companions: defineTable({    // the schema for tutors
  userId: v.id("users"),
  subject: v.string(),
  style: v.string(),
  voice: v.string()
}),

sessions: defineTable({
  userId: v.id("users"),
  companionId: v.id("companions")
})

Usage Deduction:

const creditCost = user.plan === "pro" ? 0 : 1;
if (user.credits < creditCost) {
  throw new Error("Out of credits. Upgrade to continue.");
}
await ctx.db.patch(user._id, {
  credits: user.credits - creditCost,
});

This powered all limits and nudges.

Kinde: Auth + Plan Sync

Kinde handled auth and billing. It was quick to set up.

Key Flow:

On sign-up, users see Kinde’s hosted pricing table
They pick between a free or pro plan
This plan is accessible in the session:

const { getUser } = getKindeServerSession();
const user = await getUser();
const plan = user?.user_metadata?.plan || "free";

No need to manage Stripe logic — Kinde abstracts it.

Sticky Upgrade Logic

If a user hits a limit or nears 0 credits, they see:

{user.plan === "free" && user.credits <= 2 && (
  <div className="p-4 bg-yellow-100 text-sm">
    You have {user.credits} sessions left. Upgrade now?
    <Link href="/dashboard/upgrade" className="underline ml-2">Upgrade →</Link>
  </div>
)}

This contextual upgrade performed better than static CTAs.

Vapi: Real-Time Voice AI in a Single Agent

When I started building Learnflow AI, I knew I didn’t want to manage transcription, audio streaming, GPT prompting, or TTS pipelines manually.

That’s exactly where Vapi came in.

Instead of stitching together multiple services, I defined a single agent — and Vapi handled the rest:

voice in → transcription → GPT reasoning → voice out.

The Full Flow

With one REST call, I could start a session. Behind the scenes, Vapi:

Captured live audio from the browser
Transcribed in real-time
Passed transcripts to GPT-4 using my defined agent prompt
Streamed back audio responses
Managed call events (start, end, error, speaking, etc.)

It felt like magic — but it was just solid engineering and a well-designed SDK.

How I Integrated It

I wrapped the Vapi SDK in a CompanionComponent that handled all live session logic:

Key Features:

Live transcript display
Speaking animation via Lottie
Session tracking via Convex
Mic mute/unmute toggle
Accurate state handling (connecting, active, finished)

Vapi Session Lifecycle

Let’s break it down into real steps and show you how it flows:

Example Integration: Start Session

Here’s how I kicked off a session with assistant configuration:

const handleCall = async () => {
  setCallStatus(CallStatus.CONNECTING);

  const assistantOverrides = {
    variableValues: { subject, topic, style },
    clientMessages: ["transcript"],
    serverMessages: [],
  };

  vapi.start(configureAssistant(voice, style), assistantOverrides);
};

That configureAssistant() function generates a prompt and voice configuration for the companion.

No need to manage tokens, audio streams, or AI responses — just define the personality, and Vapi handles the loop.

Live Transcript from Vapi Events

vapi.on('message', (message) => {
  if (message.type === 'transcript' && message.transcriptType === 'final') {
    const newMessage = { role: message.role, content: message.transcript };
    setMessages((prev) => [newMessage, ...prev]);
  }
});

Backend Tracking with Convex

Each session is saved to Convex for analytics and credit tracking:

addSession({
  userId: profile?._id as Id<"users">,
  companionId: companionId as Id<"companions">,
});

This makes every session persistent, linkable, and easy to manage from the dashboard or user history.

Session State Management

enum CallStatus {
  INACTIVE = 'INACTIVE',
  CONNECTING = 'CONNECTING',
  ACTIVE = 'ACTIVE',
  FINISHED = 'FINISHED',
}

These states controlled:

Button text (Start Session, Connecting, End Session)
Mic toggle behavior
Speaking animation visibility

UI Touch: Real-Time Visual Feedback

Using Lottie animations, I showed speaking activity when the assistant responded:

<Lottie
  lottieRef={lottieRef}
  animationData={soundwaves}
  autoplay={false}
  className="companion-lottie"
/>

Combined with a live transcript feed and personalized UI, it felt like a real tutor experience — not just a chatbot.

Summary: What Vapi Did For Me

Problem	Vapi Solution
Audio capture	Handled via SDK
Transcription	Real-time, no setup
GPT integration	Abstracted in agent config
TTS response	Instant playback
Call lifecycle	Built-in events
Frontend UX	Easily wrapped in React

This allowed me to focus 100% on the product experience.

Other UX Fixes That Helped

Button to „Create Tutor“ immediately visible
Tutor builder used a 3-step flow with progress
Sticky credit banner with real-time updates

Metrics + Learnings (Pre-Launch)

Area	What Went Right	What Went Wrong
Voice AI	Vapi.ai worked fast	Concurrency limit of 10 sessions(calls) at a time
Auth	Kinde pricing table made onboarding clean	Some users skipped plan selection
Usage	Convex credit tracking was reliable	No reminder on low credits
Onboarding	Tutor builder clarified usage	No welcome message caused confusion

What I Got Right

Used Vapi to avoid infra headaches
Kept onboarding tight to a single goal
Synced plan + limits into backend
Used Convex mutations to gate usage
Let users create, explore, and start quickly

What I Got Wrong

No onboarding on first launch (fixed)
No credit visibility (fixed)
Confusing builder UX at first (iterated)
Upgrade path too hidden (added sticky CTA)

Future Upgrades

Session Resume: let users return to unfinished conversations
Credit Refill Logic: monthly resets + webhook for Pro plans
Stats Dashboard: show user time spent, sessions run, etc.

Lessons Learned

Don’t overbuild early. My first goal was working audio input/output.
Real billing early. Kinde Billing + Convex saved me 2 weeks of Stripe integration.
Guided onboarding > tooltips. Let users succeed once, then ask for money.
Measure frustration points. I debugged 5 drop-off points by watching user flows.
AI doesn’t mean magic. If users don’t understand the flow, they leave.

Final Thoughts

You don’t need a team of 5 or 4 weeks to build something powerful.

In one weekend, I shipped:

Auth + roles
Voice-first sessions
Custom tutor creation
Usage limits and billing

The trick?
Leaning on tools that do the hard stuff — so I could focus on product.

Want to build AI-first products fast?

Use tools like:

Next.js App Router
Convex (backend + DB)
Kinde (auth + billing)
Vapi.ai (AI voice essions)
Shadcn

If you’re building an AI tool, my advice:

Pick one input, one output, one use case.

Build it fast. Make someone say, „Oh damn, this works.“

That’s what Learnflow AI tries to be.

Want early access? DM me on X/Twitter

Name	Typ	Größe	Geändert am	Zugriff
📁 AILInux-App	Ordner	-	27.07.2025 16:31	0755
📁 AILinux-ISO	Ordner	-	27.07.2025 10:23	0755
📁 Android-App	Ordner	-	27.07.2025 16:31	0755
📁 Distors	Ordner	-	07.07.2025 15:37	0755
📁 Wine Runtimes	Ordner	-	07.07.2025 15:37	0755

I Built a Voice‑First AI App in One Weekend — Here’s Everything I Got Right (and Wrong)