Modern e-commerce platforms must handle millions of users and thousands of simultaneous transactions. Our case study involves a large retail monolith serving millions of customers (~4,000 requests/s). The monolith struggled with scalability, so we re-architected it into microservices using Apache Kafka as the core Pub/Sub backbone. Kafka was chosen for its high throughput and decoupling: it “decouple[s] data sources from data consumers” for flexible, scalable streaming. For example, Figure 1 illustrates typical retail event-streaming use cases: real-time inventory, personalized marketing, and fraud detection. Major retailers like Walmart deploy ~8,500 Kafka nodes processing ~11 billion events per day to drive omnichannel inventory and order streams , while others (e.g. AO.com) correlate historical and live data for one-on-one marketing. These examples reflect Kafka’s strengths: massive throughput (millions of events/sec ) and service decoupling (Kafka can “completely decouple services” ). We set a goal to replicate these capabilities in our e-commerce migration.
Figure 1: Business use-case categories enabled by Kafka event streaming in retail (source: Kai Waehner ). Kafka applications span revenue-driving features (customer 360, personalization), cost-savings (modernizing legacy systems, microservices), and risk mitigation (real-time fraud and compliance). In our migration, we similarly targeted these areas: for example, we replaced a monolithic order-flow (lock-step API calls) with independent services that exchange OrderPlaced
, InventoryUpdated
, etc. events via Kafka topics. This eliminated tight coupling between services, aligning with Kafka’s role as a “dumb pipe” where only endpoints enforce logic.