TL;DR: I built Learnflow AI, a voice‑first GPT‑4 learning companion, in just one weekend using:
- Vapi for the full voice loop (speech-to-text → GPT → text-to-speech)
- Convex for backend logic and credit tracking
Kinde for auth, role-based access, and hosted billing onboarding
This post breaks down what worked, what broke, and the lessons learned — with diagrams, code, and reliability tips.
I gave myself one weekend.
One weekend to build, design, and deploy a real AI product.
The result? Learnflow AI — a voice-first learning tool where users can create tutors, start sessions, and learn by speaking with GPT-backed companions.
Think: „Duolingo meets ChatGPT meets voice notes.“
But it wasn’t all smooth. I rebuilt the onboarding three times. Scrapped one entire flow. Learned the hard way what breaks user trust.
This post is the full breakdown — how I did it, what I got wrong, and what I’d do differently.
The Goal: Build an End-to-End Voice Learning Tool
I wanted to build something useful, impressive, and real — not just another GPT wrapper.
Learnflow AI had a simple premise:
- Let users create a custom tutor (subject, tone, voice)
- Let them talk to that tutor in real time
- Use voice in, voice out — powered by an awesome voice AI tool
It had to:
- Be production-ready
- Handle real-time speech
- Track usage and offer plans
And it had to launch in one weekend.
The Tech Stack (What Actually Worked)
Problem | Tool | Why I Chose It |
---|---|---|
Auth + Feature Gating + Billing | Kinde | Easy social login + billing integration |
Database + Backend | Convex | Realtime reactive backend + clean TypeScript logic |
Voice AI | Vapi.ai | Built for multi-turn GPT conversations |
Frontend Framework | Next.js App Router | Great for routing, loading states, SSR |
Styling & Components | Shadcn | Fast UI dev |
Vapi handled everything voice-related — transcription, GPT calls, and TTS. I didn’t need to wire together OpenAI, Whisper, or ElevenLabs separately. One endpoint, one agent.
That saved days.
Flow Overview: What the User Experiences
Full User Flow:
The Frontend: What I Shipped
- Dashboard loaded with a blank UI and a “Create Tutor” button
- A list of public tutors was visible, but no onboarding flow
- No explanation of credits, plans, or what to do
What Happened:
Users froze. Some clicked around. Most left.
“What is this?”
“Where do I start?”
Minimalism without guidance is abandonment. Lesson learned.
Fix:
- Added a first-time user check in Convex
- Triggered a guided builder flow (subject, style, voice)
- Added a persistent “Start Session” CTA
- Displayed credits in top-right
if (!user.hasSeenOnboarding) {
// Show builder modal + progress
await ctx.db.patch(user._id, { hasSeenOnboarding: true });
}
The Backend: Storing Tutors, Sessions, and Plans
Convex let me move fast.
Convex Schema: Tutors + Users + Sessions
users: defineTable({
email: v.string(),
credits: v.optional(v.number()),
plan: v.optional(v.string()),
hasSeenOnboarding: v.optional(v.boolean()),
}),
companions: defineTable({ // the schema for tutors
userId: v.id("users"),
subject: v.string(),
style: v.string(),
voice: v.string()
}),
sessions: defineTable({
userId: v.id("users"),
companionId: v.id("companions")
})
Usage Deduction:
const creditCost = user.plan === "pro" ? 0 : 1;
if (user.credits < creditCost) {
throw new Error("Out of credits. Upgrade to continue.");
}
await ctx.db.patch(user._id, {
credits: user.credits - creditCost,
});
This powered all limits and nudges.
Kinde: Auth + Plan Sync
Kinde handled auth and billing. It was quick to set up.
Key Flow:
- On sign-up, users see Kinde’s hosted pricing table
- They pick between a
free
orpro
plan - This plan is accessible in the session:
const { getUser } = getKindeServerSession();
const user = await getUser();
const plan = user?.user_metadata?.plan || "free";
No need to manage Stripe logic — Kinde abstracts it.
Sticky Upgrade Logic
If a user hits a limit or nears 0 credits, they see:
{user.plan === "free" && user.credits <= 2 && (
<div className="p-4 bg-yellow-100 text-sm">
You have {user.credits} sessions left. Upgrade now?
<Link href="/dashboard/upgrade" className="underline ml-2">Upgrade →</Link>
</div>
)}
This contextual upgrade performed better than static CTAs.
Vapi: Real-Time Voice AI in a Single Agent
When I started building Learnflow AI, I knew I didn’t want to manage transcription, audio streaming, GPT prompting, or TTS pipelines manually.
That’s exactly where Vapi came in.
Instead of stitching together multiple services, I defined a single agent — and Vapi handled the rest:
voice in → transcription → GPT reasoning → voice out.
The Full Flow
With one REST call, I could start a session. Behind the scenes, Vapi:
- Captured live audio from the browser
- Transcribed in real-time
- Passed transcripts to GPT-4 using my defined agent prompt
- Streamed back audio responses
- Managed call events (start, end, error, speaking, etc.)
It felt like magic — but it was just solid engineering and a well-designed SDK.
How I Integrated It
I wrapped the Vapi SDK in a CompanionComponent
that handled all live session logic:
Key Features:
- Live transcript display
- Speaking animation via Lottie
- Session tracking via Convex
- Mic mute/unmute toggle
- Accurate state handling (
connecting
,active
,finished
)
Vapi Session Lifecycle
Let’s break it down into real steps and show you how it flows:
Example Integration: Start Session
Here’s how I kicked off a session with assistant configuration:
const handleCall = async () => {
setCallStatus(CallStatus.CONNECTING);
const assistantOverrides = {
variableValues: { subject, topic, style },
clientMessages: ["transcript"],
serverMessages: [],
};
vapi.start(configureAssistant(voice, style), assistantOverrides);
};
That configureAssistant()
function generates a prompt and voice configuration for the companion.
No need to manage tokens, audio streams, or AI responses — just define the personality, and Vapi handles the loop.
Live Transcript from Vapi Events
vapi.on('message', (message) => {
if (message.type === 'transcript' && message.transcriptType === 'final') {
const newMessage = { role: message.role, content: message.transcript };
setMessages((prev) => [newMessage, ...prev]);
}
});
Backend Tracking with Convex
Each session is saved to Convex for analytics and credit tracking:
addSession({
userId: profile?._id as Id<"users">,
companionId: companionId as Id<"companions">,
});
This makes every session persistent, linkable, and easy to manage from the dashboard or user history.
Session State Management
enum CallStatus {
INACTIVE = 'INACTIVE',
CONNECTING = 'CONNECTING',
ACTIVE = 'ACTIVE',
FINISHED = 'FINISHED',
}
These states controlled:
- Button text (
Start Session
,Connecting
,End Session
) - Mic toggle behavior
- Speaking animation visibility
UI Touch: Real-Time Visual Feedback
Using Lottie animations, I showed speaking activity when the assistant responded:
<Lottie
lottieRef={lottieRef}
animationData={soundwaves}
autoplay={false}
className="companion-lottie"
/>
Combined with a live transcript feed and personalized UI, it felt like a real tutor experience — not just a chatbot.
Summary: What Vapi Did For Me
Problem | Vapi Solution |
---|---|
Audio capture | Handled via SDK |
Transcription | Real-time, no setup |
GPT integration | Abstracted in agent config |
TTS response | Instant playback |
Call lifecycle | Built-in events |
Frontend UX | Easily wrapped in React |
This allowed me to focus 100% on the product experience.
Other UX Fixes That Helped
- Button to „Create Tutor“ immediately visible
- Tutor builder used a 3-step flow with progress
- Sticky credit banner with real-time updates
Metrics + Learnings (Pre-Launch)
Area | What Went Right | What Went Wrong |
---|---|---|
Voice AI | Vapi.ai worked fast | Concurrency limit of 10 sessions(calls) at a time |
Auth | Kinde pricing table made onboarding clean | Some users skipped plan selection |
Usage | Convex credit tracking was reliable | No reminder on low credits |
Onboarding | Tutor builder clarified usage | No welcome message caused confusion |
What I Got Right
- Used Vapi to avoid infra headaches
- Kept onboarding tight to a single goal
- Synced plan + limits into backend
- Used Convex mutations to gate usage
- Let users create, explore, and start quickly
What I Got Wrong
- No onboarding on first launch (fixed)
- No credit visibility (fixed)
- Confusing builder UX at first (iterated)
- Upgrade path too hidden (added sticky CTA)
Future Upgrades
- Session Resume: let users return to unfinished conversations
- Credit Refill Logic: monthly resets + webhook for Pro plans
- Stats Dashboard: show user time spent, sessions run, etc.
Lessons Learned
- Don’t overbuild early. My first goal was working audio input/output.
- Real billing early. Kinde Billing + Convex saved me 2 weeks of Stripe integration.
- Guided onboarding > tooltips. Let users succeed once, then ask for money.
- Measure frustration points. I debugged 5 drop-off points by watching user flows.
- AI doesn’t mean magic. If users don’t understand the flow, they leave.
Final Thoughts
You don’t need a team of 5 or 4 weeks to build something powerful.
In one weekend, I shipped:
- Auth + roles
- Voice-first sessions
- Custom tutor creation
- Usage limits and billing
The trick?
Leaning on tools that do the hard stuff — so I could focus on product.
Want to build AI-first products fast?
Use tools like:
- Next.js App Router
- Convex (backend + DB)
- Kinde (auth + billing)
- Vapi.ai (AI voice essions)
- Shadcn
If you’re building an AI tool, my advice:
Pick one input, one output, one use case.
Build it fast. Make someone say, „Oh damn, this works.“
That’s what Learnflow AI tries to be.
Want early access? DM me on X/Twitter