I’ve finally hopped on the GenAI (Generative AI) bandwagon. And honestly, this feels less like hype and more like a crucial level-up for backend engineers.
Irrespective of whether you think AI will take over everything or AI is just a parroting bubble waiting to burst, turns out that the concepts and challenges it brings to backend systems are fresh and exciting.
🤔 Questions That Pulled Me In
- What exactly is GenAI? What are LLMs, Agents, and RAGs (wait, aren’t those DAGs)?
- How do these things really work?
- More importantly — how can I, as a backend engineer, contribute?
So, I dived in. I googled (yep, old school). I asked AI tools to explain AI. And I decided to summarize everything I learn in parts, right here.
If you’re also exploring, or already working in the field — let’s connect. Share your feedback, mistakes, or suggestions in the comments.
🚀 Part 1: Model Serving with FastAPI & TorchVision
In this step, I learned how a model is served behind an API — that’s it. The same models from the earlier „ML“ days (e.g., classification models) — now exposed cleanly via API.
Serving is about making the model available for real-time or batch predictions, efficiently, securely, and at scale.
🔹 What is a Model?
A model is the core of an ML system — a program trained on data to recognize patterns and make predictions.
🔹 What is Model Serving?
Model Serving is the process of putting that trained model behind an API (e.g., FastAPI), so it can take input and return predictions (inference).
Instead of bundling the model inside every client app, you host it once, centrally.
🔍 Real-World Examples:
- 🖼️ Image → API → Model returns:
"cat"
or"dog"
- 💬 Chatbot message → API → LLM replies
- 📄 Transaction → Fraud model →
"fraud"
or"legit"
📚 Key Terms I Came Across
- Inference → Running the model on new (unseen) input data
- Model Hosting → Putting the model on a server (local or cloud) and exposing an API
📝 Read: Model Serving 101 (Paywalled…)
🔥 Key Takeaways from the Article:
-
Model serving introduces a distinct set of challenges compared to a typical CRUD backend. It’s as if the heavy-lifting data pipelines we used to run in the background now need to respond to client requests in real-time, with strict performance and scalability requirements.
-
Key factors to balance:
- 🚀 Throughput – predictions/sec
- ⏱️ Latency – response time
- 💰 Cost – infra & compute
-
3 Fundamental Deployment Types:
- Online Real-Time Inference
- Asynchronous Inference
- Offline Batch Transform
🧰 Tools Used
⚡ FastAPI
A high-performance Python web framework.
📖 Official Tutorial: FastAPI Docs
🖼️ TorchVision
A interesting PyTorch library that provides:
- Pretrained computer vision models (like
resnet18
,mobilenet
) used for image or object classification - Tools for image transformations and loading
💡 Why it’s great: You don’t need to train from scratch. You can just load and serve a powerful image model in minutes.
📖 Read: TorchVision Basics
⚙️ Step-by-Step: Serving a Model via API
🔹 Step 1: Setup Environment
mkdir model-serving && cd model-serving
python3 -m venv venv
venvScriptsactivate # On Mac/Linux: source venv/bin/activate
pip install fastapi uvicorn torch torchvision pillow requests
🔹 Step 2: Create Model Loader
📄 model.py
import torch
from torchvision import models, transforms
from PIL import Image
import requests
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model.eval()
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
)
])
LABELS_URL = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
response = requests.get(LABELS_URL)
labels = [line.strip() for line in response.text.splitlines()]
def predict(image_path):
image = Image.open(image_path).convert("RGB")
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(input_tensor)
pred_index = output.argmax().item()
return labels[pred_index]
def predict_topk(img_path):
image = Image.open(img_path).convert("RGB")
input_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(input_tensor)
probs = torch.nn.functional.softmax(output[0], dim=0)
top_p, top_i = torch.topk(probs, 5)
top_labels = [(labels[idx], round(prob.item(), 4)) for idx, prob in zip(top_i, top_p)]
return top_labels
🔎 Interesting learning: models.ResNet18_Weights.DEFAULT
loads a model pre-trained on 1000 categories. These labels come from a public file maintained by PyTorch (based on the ImageNet dataset). The model outputs a probability distribution over these categories, and the index with the highest score maps to the predicted label.
🔍 I have included both predict
and predict_topk
methods to demonstrate how you can work with the model’s output. While predict
gives you just the top result, predict_topk
provides the top 5 predictions along with confidence scores. This is useful when you want more insight into what the model „thinks“ the image could be, especially in ambiguous cases.
🔹 Step 3: Create FastAPI App
📄 app.py
from fastapi import FastAPI, UploadFile, File
from model import predict
import shutil
app = FastAPI()
@app.post("/predict")
async def classify_image(file: UploadFile = File(...)):
temp_path = f"/tmp/{file.filename}"
with open(temp_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
result = predict(temp_path) # or predict_topk(temp_path)
return {"prediction": result}
🔹 Step 4: Run the API
uvicorn app:app --reload
Go to: http://127.0.0.1:8000/docs
📤 Upload any image → get a prediction
✅ That’s it! You’ve served your first model. Now you can integrate this into real-world applications or scale it using cloud services.
GitHub Repo
💻 Code Repository: Coming soon – I’ll be uploading the full project and updates to a public GitHub repo.
🪜 Coming Up Next
Next, I plan to explore Apache Airflow and how it’s used for ML workflows and pipelines — one layer deeper each time 💡