When to Stop Calling APIs and Start Publishing Events

Series: Backend Engineering Fundamentals · Post 05 of 07 Level: Advanced · Read time: ~10 min
Picture a simple checkout flow: user places an order → charge the card → update inventory → send a confirmation email → notify the warehouse → update analytics.
In a synchronous world, your checkout endpoint calls each of those services in sequence. If the email service is slow, the checkout is slow. If the warehouse notification times out, do you roll back the charge? If analytics is down, does checkout fail?
Synchronous chains are brittle. They couple your system's availability to the availability of every downstream service. At small scale, this is manageable. At scale, it becomes the source of cascading failures, long tail latencies, and 3am incidents.
Message queues and event streaming are how you break these chains.
The Core Idea: Decouple Producers from Consumers
Instead of Service A calling Service B directly, A publishes an event to a queue or topic. B (and C, and D) subscribe and process that event independently, at their own pace.
❌ Synchronous — Tightly Coupled
OrderService → [HTTP] → PaymentService → [HTTP] → EmailService → [HTTP] → WarehouseService
(if any step fails, the whole chain fails)
✅ Event-Driven — Loosely Coupled
OrderService → [Publish: order.placed] → Message Broker
↓
┌───────────────┼────────────────┐
↓ ↓ ↓
PaymentService EmailService WarehouseService
(processes when (processes (processes when
ready) independently) ready)
This shift — from calling to publishing — fundamentally changes how your system scales and fails.
Message Queues vs Event Streaming
These are related but distinct concepts. Getting the distinction right matters for choosing the right tool.
| Message Queue | Event Stream | |
|---|---|---|
| Model | Work distribution — each message consumed by one consumer | Log — multiple consumers read the full stream independently |
| After consumption | Message is deleted | Message is retained (configurable duration) |
| Replay | Not supported | Supported — reprocess from any point |
| Ordering | Per-queue FIFO | Ordered within a partition |
| Best for | Task distribution, job queues | Event sourcing, audit logs, real-time pipelines |
| Tools | RabbitMQ, Amazon SQS, ActiveMQ | Kafka, Amazon Kinesis, Pulsar |
RabbitMQ — The Message Queue Standard
RabbitMQ is a mature, AMQP-based message broker. The mental model: producers send messages to exchanges, exchanges route them to queues, consumers read from queues.
import pika
# Producer: Publishing a task
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='email_notifications', durable=True)
# durable=True: queue survives broker restart
channel.basic_publish(
exchange='',
routing_key='email_notifications',
body='{"type": "order_confirmation", "orderId": "789", "userId": "123"}',
properties=pika.BasicProperties(delivery_mode=2) # 2 = persistent message
)
# Consumer: Processing tasks
def process_email(ch, method, properties, body):
data = json.loads(body)
send_confirmation_email(data['userId'], data['orderId'])
ch.basic_ack(delivery_tag=method.delivery_tag) # Acknowledge success
channel.basic_qos(prefetch_count=1) # Process one message at a time
channel.basic_consume(queue='email_notifications', on_message_callback=process_email)
channel.start_consuming()
Key RabbitMQ concepts:
- Acknowledgments (ack/nack): Consumer explicitly confirms it processed the message. If it crashes before acking, the message is redelivered. If it nacks, it can be requeued or sent to a dead-letter exchange.
- Dead Letter Exchange (DLX): Messages that fail processing (after retry limits) are routed here. Critical for debugging and not silently dropping failures.
- Exchange types: Direct (exact routing key match), Topic (wildcard routing), Fanout (broadcast to all bound queues).
# Dead Letter Queue setup
channel.queue_declare(
queue='email_notifications',
durable=True,
arguments={
'x-dead-letter-exchange': 'dlx',
'x-message-ttl': 60000, # Messages expire after 60s if not consumed
'x-max-retries': 3 # Custom header for retry counting
}
)
Apache Kafka — Event Streaming at Scale
Kafka is fundamentally different from RabbitMQ. It's a distributed log: events are appended to topics (partitioned, replicated logs), and consumers read from those logs at their own offset.
Topic: order-events (3 partitions)
Partition 0: [order.placed, order.placed, order.cancelled]
Partition 1: [order.placed, order.shipped, order.delivered]
Partition 2: [order.placed, order.paid]
Consumer Group A (Order Fulfillment): reads all partitions, tracks offset
Consumer Group B (Analytics): reads all partitions, independent offset
Consumer Group C (Fraud Detection): reads all partitions, independent offset
Each group processes the FULL stream independently.
Adding a new consumer group doesn't affect existing ones.
from confluent_kafka import Producer, Consumer
# Producer
producer = Producer({'bootstrap.servers': 'kafka:9092'})
def publish_order_event(order_id: str, event_type: str, data: dict):
producer.produce(
topic='order-events',
key=order_id, # Same key → same partition → ordered for this order
value=json.dumps({"type": event_type, "orderId": order_id, **data}),
callback=delivery_report
)
producer.flush()
# Consumer
consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'order-fulfillment-service',
'auto.offset.reset': 'earliest' # Start from beginning if no committed offset
})
consumer.subscribe(['order-events'])
while True:
msg = consumer.poll(1.0)
if msg and not msg.error():
event = json.loads(msg.value())
process_order_event(event)
consumer.commit() # Commit offset after successful processing
Kafka's superpower — replay: Because events are retained in the log, you can:
- Replay events to rebuild a corrupted database
- Add a new downstream service and backfill it from the beginning of time
- Debug production issues by replaying the exact event sequence
Kafka vs RabbitMQ — Choosing the Right Tool
| Scenario | Use |
|---|---|
| Background job processing (email, notifications, PDF generation) | RabbitMQ / SQS |
| Multiple services need to react to the same event independently | Kafka |
| You need to replay or audit events | Kafka |
| Simple task queue, low throughput | RabbitMQ / SQS |
| Real-time data pipelines, event sourcing | Kafka |
| You want managed, minimal ops overhead | Amazon SQS or Google Pub/Sub |
| Microservices with complex routing rules | RabbitMQ |
| >100k events/second | Kafka |
💡 Amazon SQS is the "just works" option for AWS shops. No broker to manage, virtually unlimited scale, pay-per-use. For most task queue use cases, it's the practical default.
Delivery Guarantees — This Matters More Than Most Teams Realize
Not all message systems deliver the same guarantee:
| Guarantee | Meaning | Risk |
|---|---|---|
| At-most-once | Message delivered 0 or 1 times | Messages can be lost |
| At-least-once | Message delivered 1 or more times | Duplicate processing possible |
| Exactly-once | Message delivered exactly once | Hard to guarantee end-to-end; Kafka transactions support this |
Most systems use at-least-once delivery. This means your consumers must be idempotent — processing the same message twice must produce the same result as processing it once.
def process_payment(payment_id: str, amount: float):
# ❌ NOT idempotent — charges twice if message is redelivered
charge_card(payment_id, amount)
# ✅ Idempotent — check if already processed
if db.payment_exists(payment_id):
return # Already processed, safe to skip
with db.transaction():
charge_card(payment_id, amount)
db.record_payment(payment_id, amount)
Common Patterns
Fan-Out
One event triggers multiple independent consumers:
order.placed
├── EmailService (send confirmation)
├── InventoryService (reserve stock)
├── AnalyticsService (track purchase)
└── LoyaltyService (award points)
Saga Pattern — Distributed Transactions
When you need a transaction across multiple services without a distributed lock:
Choreography-based Saga:
1. OrderService publishes: order.created
2. PaymentService consumes, processes payment, publishes: payment.completed
3. InventoryService consumes, reserves stock, publishes: inventory.reserved
4. FulfillmentService consumes, ships order, publishes: order.fulfilled
On failure at step 3:
3b. InventoryService publishes: inventory.failed
4b. PaymentService consumes inventory.failed, issues refund, publishes: payment.refunded
Outbox Pattern — Reliable Event Publishing
The classic dual-write problem: how do you update the database AND publish an event atomically?
# ❌ WRONG — race condition
def place_order(order: Order):
db.save(order) # Succeeds
kafka.publish(order_event) # Fails → event never published, DB inconsistent
# ✅ CORRECT — Transactional Outbox Pattern
def place_order(order: Order):
with db.transaction():
db.save(order)
db.outbox.insert({ # Write event to outbox table in same transaction
"topic": "order-events",
"payload": order_event_json,
"published": False
})
# Separate process polls outbox and publishes to Kafka reliably
Key Takeaways
- Message queues decouple services — a slow downstream service no longer blocks your upstream caller
- RabbitMQ is the right choice for task distribution, complex routing, and lower-throughput workloads
- Kafka is for high-throughput event streaming, replay, audit, and fan-out at scale
- SQS/Pub Sub for managed simplicity with minimal operational overhead
- Idempotency is mandatory with at-least-once delivery — design your consumers to handle duplicates safely
- The Outbox Pattern solves reliable event publishing without distributed transactions
- Don't go event-driven prematurely — if your system has 3 services, synchronous calls are probably fine
Have you dealt with a cascade failure in a synchronous service chain that made you switch to async? What was the tipping point?
Next in the series → Post 06: Scaling — Before You Buy More Servers, Read This
You've decoupled your services with events. Now: how do you scale the services themselves?



