Build a Semantic Search-Powered FAQ Assistant with TiDB and AWS Bedrock

Semantic search is rapidly transforming how apps deliver relevant content to users. Think of chatbots that can really understand your questions, or help centers that can instantly find the answer you meant rather than the exact words you typed. But many semantic search examples use overly complex architectures (i.e., multiple microservices, sprawling pipelines, labyrinthine config). To cut through the noise, I’ve bundled a lean, end-to-end demo you can clone from my GitHub repository and spin up in minutes. By the end of this tutorial, you’ll have a CLI that ingests FAQs and a React & FastAPI web UI for a more interactive demo.

In this tutorial, you’ll configure TiDB Cloud’s serverless vector columns with AWS Bedrock’s Titan-V2 embeddings, set up your .env, and build a CLI that ingests FAQs, generates and stores embeddings and answers queries in your terminal.

By the end, you’ll have

A CLI that ingests FAQ data and answers questions semantically.
A React + FastAPI web UI for interactive demos.

Step 1: Prerequisites

MacOS

Please make sure you’ve installed the following software on your machine:

macOS with Python 3.8+
AWS CLI v2 configured with an IAM user that has AWS Bedrock access (aws configure)
A TiDB Cloud Serverless cluster (free tier – no credit card required)
System CA bundle at /etc/ssl/cert.pem

Windows

If you’re on Windows instead of macOS, you’ll need to tweak a couple of steps:
Install Git, Python & AWS CLI
– Git for Windows: https://git-scm.com/download/win
– Python 3.8+ from the Microsoft Store or https://python.org/downloads/windows/
– AWS CLI v2 MSI installer: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Create & activate the virtualenv

cd semantic-qna
python -m venv .venv
# In PowerShell
.venvScriptsActivate.ps1
# Or in Command Prompt
.venvScriptsactivate.bat

You should now see (.venv) at your prompt.
System CA bundle
– Windows doesn’t have /etc/ssl/cert.pem.
– Either omit the ssl_ca parameter (boto3/urllib will use the OS cert store),
– Or download a PEM bundle (e.g. from https://curl.se/ca/cacert.pem) and point to it:

Set env-vars in PowerShell or CMD

# PowerShell
$Env:AWS_REGION="us-east-1"
$Env:AWS_ACCESS_KEY_ID="YOUR_AWS_KEY"
$Env:AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET"
$Env:DATABASE_URL="mysql+pymysql://<user>:<pass>@…"

REM Command Prompt
set AWS_REGION=us-east-1
set AWS_ACCESS_KEY_ID=YOUR_AWS_KEY
set AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET
set DATABASE_URL=mysql+pymysql://<user>:<pass>@...

After that you can pip install -r requirements.txt and run everything exactly the same as on macOS.

Step 2: Clone & Bootstrap

Next, let’s clone the demo project and install dependencies:

# 1. Clone the repo
git clone https://github.com/RealChrisSean/semantic-qna.git
cd semantic-qna

# 2. Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate   # (.venv) will appear in your shell prompt

# 3. Install the necessary packages
pip install --upgrade pip
pip install -r requirements.txt

Step 3: Configure Credentials with .env

Once you’ve cloned the repo, create your .env in the project root (same folder as app.py):

AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=YOUR_AWS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET

**Note: The TiDB Cloud hostname will vary by region. Make sucopy yours from the TiDB Cloud console**
DATABASE_URL=mysql+pymysql://<TIDB_USER>:<TIDB_PASSWORD>@
gateway01.us-west-2.prod.aws.tidbcloud.com:4000/test?
ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true

Make sure to replace and with the actual credentials from TiDB Cloud.

Find your TiDB connection string
- In TiDB Cloud, go to Connect → Public → Python. Copy the SQLAlchemy URL.
- Paste it into DATABASE_URL above, ensuring your user and password are correct.
Get your AWS keys that can talk to Bedrock
- Log into your AWS Console
- In the AWS console, click your username → Security Credentials → Create access key.
- Copy the Access key ID and Secret access key (you’ll only see the secret once).

Step 4: Sample Data: faqs.json

With your credentials in place, prepare some FAQ data in a JSON file so it’s easy to swap in your own later:

[
  {
    "id": "1",
    "question": "What is your return policy?",
    "answer": "You can return items within 30 days."
  },
  {
    "id": "2",
    "question": "How long does shipping take?",
    "answer": "Standard shipping takes 3–5 business days."
  },
  {
    "id": "3",
    "question": "Do you ship internationally?",
    "answer": "Yes, we ship to over 35 countries."
  },
  {
    "id": "4",
    "question": "How do I track my order?",
    "answer": "Check the tracking link in your confirmation email."
  }
]

Step 5: Command-Line App: app.py

Now that your data is ready, let’s build the CLI so you can see semantic search in your terminal. We’ll break it down section by section:

Imports and Configuration

import os, json
from pathlib import Path
from dotenv import load_dotenv       # loads our .env file
import boto3                         # AWS SDK for Python
from typing import List
from tidb_vector.integrations import TiDBVectorClient

# Load all variables from .env
load_dotenv()

# Read environment variables, with some defaults
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
TIDB_CONN_STR  = os.getenv("DATABASE_URL")
VECTOR_DIM = 1024        # Titan-V2 embeddings are always 1024 elements
TABLE_NAME = "faqs"      # name of the vector table we'll create

What’s happening here?

load_dotenv() loads .env so os.environ picks up our AWS keys, DB URL, etc.
TiDBVectorClient is a library that simplifies vector operations (insert, query) in TiDB, no need to craft raw SQL for vector math.
We store constants (VECTOR_DIM, TABLE_NAME) in variables for clarity.

The bedrock_embed Function

We need to convert text → numeric embeddings. That’s what we feed into TiDB for similarity search.

def bedrock_embed(text: str) -> List[float]:
    """
    Convert a single string into a Titan-V2 embedding (1024 floats).
    """
    # Create a 'bedrock-runtime' client in the correct region
    brt = boto3.client("bedrock-runtime", region_name=AWS_REGION)

    # Prepare the JSON payload. We just pass 'inputText': the text to embed.
    payload = {"inputText": text}

    # The 'invoke_model' call sends the payload to Amazon Titan
    resp = brt.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",   # The specific Titan embedding model
        contentType="application/json",
        accept="application/json",
        body=json.dumps(payload),
    )

    # The response body is a streaming object, so we read and JSON-parse it
    response_body = json.loads(resp["body"].read())

    # Finally, return the list of floats
    return response_body["embeddingsByType"]["float"]

Key Points

We’re using the “bedrock-runtime” Boto3 client, which is how you call AWS Bedrock.
modelId=“amazon.titan-embed-text-v2:0″: This is Amazon’s Titan text embedding model.
The model returns multiple formats (like float or quantized embeddings). We’re grabbing the float array for maximum precision.

Ingesting FAQs into TiDB

We’ll create (or recreate) a table for FAQ vectors, then load data from faqs.json, embed it, and store it in TiDB.

def batch_embed_batch(texts: List[str]) -> List[List[float]]:
    """
    Embed multiple texts in one Titan batch request.
    Returns a list of 1024-float vectors, one per input string.
    """
    brt = boto3.client("bedrock-runtime", region_name=AWS_REGION)
    resp = brt.invoke_model(
        modelId=os.getenv("BEDROCK_MODEL_ID", "amazon.titan-embed-text-v2:0"),
        contentType="application/json",
        accept="application/json",
        body=json.dumps({"inputTextArray": texts}),
    )
    data = json.loads(resp["body"].read())
    try:
        return data["embeddingsByType"]["floatArray"]

Why this helper exists: Single-row embedding calls are fine for demos, but they slam the brakes the moment you load more than a handful of records. Every call to Bedrock is an HTTPS round-trip, so latency stacks up fast.
batch_embed_batch() bundles an entire list of texts into one request, turning N network hops into one. So this effectively cut our cold-start from 1 minute and 40 seconds for 200 FAQs down to under 4 seconds.

def ingest_faqs(file_path: str = FAQ_FILE):
    """Create the table and load FAQs from a JSON file."""
    client = TiDBVectorClient(
        table_name          = TABLE_NAME,
        connection_string   = TIDB_CONN_STR,
        vector_dimension    = VECTOR_DIM,
        drop_existing_table = False,
    )

    # Check if the table already has data using a standalone engine
    engine = create_engine(TIDB_CONN_STR)
    with engine.connect() as conn:
        row = conn.execute(text(f"SELECT 1 FROM {TABLE_NAME} LIMIT 1")).first()
    if row is not None:
        return client

    faq_path = Path(file_path)
    with faq_path.open("r", encoding="utf-8") as f:
        faqs = json.load(f)

    ids   = [row["id"] for row in faqs]
    texts = [row["question"] for row in faqs]
    metas = [{"answer": row["answer"]} for row in faqs]

    # Batch embed all questions at once
    embs = batch_embed_batch(texts)

    client.insert(ids=ids, texts=texts, embeddings=embs, metadatas=metas)
    return client

Key points to verify:

drop_existing_table= False – so you don’t recreate on each run.
The row-exists check uses a standalone SQLAlchemy engine.
The final client.insert(…) remains the same but uses the new ids, texts, embs, and metas arrays.

Querying for the Closest FAQ

Given a user’s question, we embed it, then run a similarity search for k=1 in TiDB, returning the best match.

def query_faq(question: str, client=None):
    """
    1) Embed the user question via bedrock_embed
    2) Query TiDB for the single closest vector
    3) Return that FAQ's question and answer
    """
    # Step 1: Convert user question into a 1024-dim vector
    q_vec = bedrock_embed(question)

    # Step 2: Run a vector search for the top-1 match
    results = client.query(q_vec, k=1)

    # If no results, we return a "not found" structure
    if not results:
        return {"question": None, "answer": None}

    # Otherwise, get the best match (index 0)
    best = results[0]

    # We expect 'text' to hold the question text (some versions might call it 'payload')
    stored_question = getattr(best, "text", None) or getattr(best, "payload", f"<id {best.id}>")

    # 'metadata' is where we stored the FAQ answer
    stored_answer = (best.metadata or {}).get("answer", "")

    return {"question": stored_question, "answer": stored_answer}

Why only k=1?

We just want the single best match for a standard FAQ scenario. If you wanted multiple top suggestions, you’d use k=3 or k=5.

Interactive Loop & main()

Finally, we provide a simple text-based UI so you can type questions in your terminal.

def main():
    # Ingest the FAQs, storing them in TiDB. Returns a client object for queries.
    client = ingest_faqs()

    # We'll prompt until the user types 'exit' or 'quit'
    while True:
        user_q = input("n🧐  Ask a question ('exit' to quit): ").strip()
        if user_q.lower() in {"exit", "quit"}:
            break

        result = query_faq(user_q, client)
        if not result["question"]:
            print("🤷  No semantic match found.")
        else:
            print(f"🎯  Closest Q: {result['question']}")
            print(f"💡  Answer:   {result['answer']}")

if __name__ == "__main__":
    main()

How it works

ingest_faqs() is called once on startup to initialize the database table and embed all your FAQ data.
We then enter an infinite loop to ask for user input.
The user’s question is embedded and matched against the database.
We print out the best matching FAQ’s question and answer.

Step 6: Run the CLI

Next, fire up the terminal and launch your app:

python app.py

You’ll see something like:

Type “What is your return policy?” and see if it retrieves the correct answer. Try a few variations (like “can I send something back for a refund?”) to see how it still returns the same or similar FAQ.
That’s your local semantic search in action!

Step 7: FastAPI Server: server.py

If you want a web-based Q&A experience, we’ll serve both:

Our React UI (via a static HTML file).
A /query endpoint that does the same embedding + vector search logic as the CLI.

from pathlib import Path
from pydantic import BaseModel
from app import ingest_faqs, query_faq
from fastapi import FastAPI, HTTPException
from fastapi.responses import FileResponse
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # We'll reuse this client across requests
    global client
    client = ingest_faqs()
    yield

app = FastAPI(lifespan=lifespan)

# Serve our React index.html at the root path
@app.get("/", include_in_schema=False)
async def read_index():
    return FileResponse(Path(__file__).parent / "index.html")

client = None

# A Pydantic model for the incoming JSON from React
class QueryReq(BaseModel):
    question: str

@app.post("/query")
def ask(req: QueryReq):
    # Use the same logic we have in app.py
    res = query_faq(req.question, client)
    if not res["question"]:
        raise HTTPException(404, "No semantic match found")
    return res

![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ubpgbjhzoxsni0v0b5f2.png)
def health():
    return {"status": "ok"}

Key Points

Lifespan context: Replaces older startup/shutdown events in FastAPI, ensuring we call ingest_faqs() once on startup.
@app.get(„/“): Returns our index.html so you can open the React interface in your browser.
/query: Receives a JSON payload ({„question“: „some text“}), calls query_faq(), and returns JSON with {„question“: „…“, „answer“: „…“}.

Step 8: Launch the Web Server

Now, run Uvicorn to glue it all together through our helper script (run_with_bar.py):

python run_with_bar.py

You’ll see a progress bar and the total load time before the FastAPI app is live.
Now open http://localhost:8000 in your browser.
You’ll see a basic Q&A interface. Type your question, watch the magic happen!

You’ve now walked through every step of building a semantic FAQ assistant. From setting up your environment to querying embeddings in both a CLI and Web UI. By combining TiDB’s vector storage with AWS Bedrock’s Titan V2 Embeddings, you’ve created a solution that is simple to understand and easy to extend.

Where to go from here?

Replace faqs.json with real support tickets, product specs or documentation.
Add GitHub Actions so every pull request runs pytest.
Pipe the top match into ChatGPT or Claude to create richer and more conversational answers.

TiDB Cloud gives you vector search without maintenance and Bedrock hides the heavy lifting of embeddings. Together they let you focus on what matters most, which is delivering answers that actually make sense.

All of the source code lives in one place. Feel free to revisit, fork, or open issues at https://github.com/RealChrisSean/semantic-qna

Name	Typ	Größe	Geändert am	Zugriff
📄 archlinux-2025.05.01-x86_64.iso	ISO	1.16 GB	18.05.2025 09:45	-rw-r--r--
📄 kubuntu-24.04.2-desktop-amd64.iso	ISO	4.22 GB	18.05.2025 09:48	-rw-r--r--
📄 neon-user-20250511-0744.iso	ISO	2.65 GB	18.05.2025 09:46	-rw-r--r--
📄 ubuntu-24.04.2-live-server-amd64.iso	ISO	2.99 GB	19.05.2025 07:44	-rw-r--r--

Step 1: Prerequisites

MacOS

Windows

Step 2: Clone & Bootstrap

Step 3: Configure Credentials with .env

Step 4: Sample Data: faqs.json

Step 5: Command-Line App: app.py

Imports and Configuration

What’s happening here?

The bedrock_embed Function

Key Points

Ingesting FAQs into TiDB

Key points to verify:

Querying for the Closest FAQ

Why only k=1?

Interactive Loop & main()

How it works

Step 6: Run the CLI

Step 7: FastAPI Server: server.py

Key Points

Step 8: Launch the Web Server

Where to go from here?

Schreibe einen Kommentar Antworten abbrechen