Skip to main content

Command Palette

Search for a command to run...

When to Stop Calling APIs and Start Publishing Events

Published
8 min read
When to Stop Calling APIs and Start Publishing Events
A
Real-world engineering insights from 20+ years building scalable systems. Focused on AI, RAG architectures, and production-ready system design.

Series: Backend Engineering Fundamentals · Post 05 of 07 Level: Advanced · Read time: ~10 min


Picture a simple checkout flow: user places an order → charge the card → update inventory → send a confirmation email → notify the warehouse → update analytics.

In a synchronous world, your checkout endpoint calls each of those services in sequence. If the email service is slow, the checkout is slow. If the warehouse notification times out, do you roll back the charge? If analytics is down, does checkout fail?

Synchronous chains are brittle. They couple your system's availability to the availability of every downstream service. At small scale, this is manageable. At scale, it becomes the source of cascading failures, long tail latencies, and 3am incidents.

Message queues and event streaming are how you break these chains.


The Core Idea: Decouple Producers from Consumers

Instead of Service A calling Service B directly, A publishes an event to a queue or topic. B (and C, and D) subscribe and process that event independently, at their own pace.

❌ Synchronous — Tightly Coupled

OrderService → [HTTP] → PaymentService → [HTTP] → EmailService → [HTTP] → WarehouseService
  (if any step fails, the whole chain fails)


✅ Event-Driven — Loosely Coupled

OrderService → [Publish: order.placed] → Message Broker
                                              ↓
                              ┌───────────────┼────────────────┐
                              ↓               ↓                ↓
                        PaymentService   EmailService   WarehouseService
                    (processes when     (processes      (processes when
                       ready)           independently)    ready)

This shift — from calling to publishing — fundamentally changes how your system scales and fails.


Message Queues vs Event Streaming

These are related but distinct concepts. Getting the distinction right matters for choosing the right tool.

Message Queue Event Stream
Model Work distribution — each message consumed by one consumer Log — multiple consumers read the full stream independently
After consumption Message is deleted Message is retained (configurable duration)
Replay Not supported Supported — reprocess from any point
Ordering Per-queue FIFO Ordered within a partition
Best for Task distribution, job queues Event sourcing, audit logs, real-time pipelines
Tools RabbitMQ, Amazon SQS, ActiveMQ Kafka, Amazon Kinesis, Pulsar

RabbitMQ — The Message Queue Standard

RabbitMQ is a mature, AMQP-based message broker. The mental model: producers send messages to exchanges, exchanges route them to queues, consumers read from queues.

import pika

# Producer: Publishing a task
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

channel.queue_declare(queue='email_notifications', durable=True)
# durable=True: queue survives broker restart

channel.basic_publish(
    exchange='',
    routing_key='email_notifications',
    body='{"type": "order_confirmation", "orderId": "789", "userId": "123"}',
    properties=pika.BasicProperties(delivery_mode=2)  # 2 = persistent message
)

# Consumer: Processing tasks
def process_email(ch, method, properties, body):
    data = json.loads(body)
    send_confirmation_email(data['userId'], data['orderId'])
    ch.basic_ack(delivery_tag=method.delivery_tag)  # Acknowledge success

channel.basic_qos(prefetch_count=1)  # Process one message at a time
channel.basic_consume(queue='email_notifications', on_message_callback=process_email)
channel.start_consuming()

Key RabbitMQ concepts:

  • Acknowledgments (ack/nack): Consumer explicitly confirms it processed the message. If it crashes before acking, the message is redelivered. If it nacks, it can be requeued or sent to a dead-letter exchange.
  • Dead Letter Exchange (DLX): Messages that fail processing (after retry limits) are routed here. Critical for debugging and not silently dropping failures.
  • Exchange types: Direct (exact routing key match), Topic (wildcard routing), Fanout (broadcast to all bound queues).
# Dead Letter Queue setup
channel.queue_declare(
    queue='email_notifications',
    durable=True,
    arguments={
        'x-dead-letter-exchange': 'dlx',
        'x-message-ttl': 60000,        # Messages expire after 60s if not consumed
        'x-max-retries': 3             # Custom header for retry counting
    }
)

Apache Kafka — Event Streaming at Scale

Kafka is fundamentally different from RabbitMQ. It's a distributed log: events are appended to topics (partitioned, replicated logs), and consumers read from those logs at their own offset.

Topic: order-events (3 partitions)

Partition 0: [order.placed, order.placed, order.cancelled]
Partition 1: [order.placed, order.shipped, order.delivered]
Partition 2: [order.placed, order.paid]

Consumer Group A (Order Fulfillment): reads all partitions, tracks offset
Consumer Group B (Analytics): reads all partitions, independent offset
Consumer Group C (Fraud Detection): reads all partitions, independent offset

Each group processes the FULL stream independently.
Adding a new consumer group doesn't affect existing ones.
from confluent_kafka import Producer, Consumer

# Producer
producer = Producer({'bootstrap.servers': 'kafka:9092'})

def publish_order_event(order_id: str, event_type: str, data: dict):
    producer.produce(
        topic='order-events',
        key=order_id,          # Same key → same partition → ordered for this order
        value=json.dumps({"type": event_type, "orderId": order_id, **data}),
        callback=delivery_report
    )
    producer.flush()

# Consumer
consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'order-fulfillment-service',
    'auto.offset.reset': 'earliest'  # Start from beginning if no committed offset
})

consumer.subscribe(['order-events'])

while True:
    msg = consumer.poll(1.0)
    if msg and not msg.error():
        event = json.loads(msg.value())
        process_order_event(event)
        consumer.commit()  # Commit offset after successful processing

Kafka's superpower — replay: Because events are retained in the log, you can:

  • Replay events to rebuild a corrupted database
  • Add a new downstream service and backfill it from the beginning of time
  • Debug production issues by replaying the exact event sequence

Kafka vs RabbitMQ — Choosing the Right Tool

Scenario Use
Background job processing (email, notifications, PDF generation) RabbitMQ / SQS
Multiple services need to react to the same event independently Kafka
You need to replay or audit events Kafka
Simple task queue, low throughput RabbitMQ / SQS
Real-time data pipelines, event sourcing Kafka
You want managed, minimal ops overhead Amazon SQS or Google Pub/Sub
Microservices with complex routing rules RabbitMQ
>100k events/second Kafka

💡 Amazon SQS is the "just works" option for AWS shops. No broker to manage, virtually unlimited scale, pay-per-use. For most task queue use cases, it's the practical default.


Delivery Guarantees — This Matters More Than Most Teams Realize

Not all message systems deliver the same guarantee:

Guarantee Meaning Risk
At-most-once Message delivered 0 or 1 times Messages can be lost
At-least-once Message delivered 1 or more times Duplicate processing possible
Exactly-once Message delivered exactly once Hard to guarantee end-to-end; Kafka transactions support this

Most systems use at-least-once delivery. This means your consumers must be idempotent — processing the same message twice must produce the same result as processing it once.

def process_payment(payment_id: str, amount: float):
    # ❌ NOT idempotent — charges twice if message is redelivered
    charge_card(payment_id, amount)
    
    # ✅ Idempotent — check if already processed
    if db.payment_exists(payment_id):
        return  # Already processed, safe to skip
    
    with db.transaction():
        charge_card(payment_id, amount)
        db.record_payment(payment_id, amount)

Common Patterns

Fan-Out

One event triggers multiple independent consumers:

order.placed
    ├── EmailService (send confirmation)
    ├── InventoryService (reserve stock)
    ├── AnalyticsService (track purchase)
    └── LoyaltyService (award points)

Saga Pattern — Distributed Transactions

When you need a transaction across multiple services without a distributed lock:

Choreography-based Saga:

1. OrderService publishes: order.created
2. PaymentService consumes, processes payment, publishes: payment.completed
3. InventoryService consumes, reserves stock, publishes: inventory.reserved
4. FulfillmentService consumes, ships order, publishes: order.fulfilled

On failure at step 3:
3b. InventoryService publishes: inventory.failed
4b. PaymentService consumes inventory.failed, issues refund, publishes: payment.refunded

Outbox Pattern — Reliable Event Publishing

The classic dual-write problem: how do you update the database AND publish an event atomically?

# ❌ WRONG — race condition
def place_order(order: Order):
    db.save(order)              # Succeeds
    kafka.publish(order_event)  # Fails → event never published, DB inconsistent

# ✅ CORRECT — Transactional Outbox Pattern
def place_order(order: Order):
    with db.transaction():
        db.save(order)
        db.outbox.insert({       # Write event to outbox table in same transaction
            "topic": "order-events",
            "payload": order_event_json,
            "published": False
        })
    # Separate process polls outbox and publishes to Kafka reliably

Key Takeaways

  • Message queues decouple services — a slow downstream service no longer blocks your upstream caller
  • RabbitMQ is the right choice for task distribution, complex routing, and lower-throughput workloads
  • Kafka is for high-throughput event streaming, replay, audit, and fan-out at scale
  • SQS/Pub Sub for managed simplicity with minimal operational overhead
  • Idempotency is mandatory with at-least-once delivery — design your consumers to handle duplicates safely
  • The Outbox Pattern solves reliable event publishing without distributed transactions
  • Don't go event-driven prematurely — if your system has 3 services, synchronous calls are probably fine

Have you dealt with a cascade failure in a synchronous service chain that made you switch to async? What was the tipping point?


Next in the series → Post 06: Scaling — Before You Buy More Servers, Read This

You've decoupled your services with events. Now: how do you scale the services themselves?

Backend Engineering Fundamentals

Part 3 of 6

Backend systems don't fail because of bad code alone — they fail because of bad decisions. This series breaks down the foundational concepts every developer, architect, and engineer needs to build systems that scale, stay secure, and survive production: APIs, caching, security, databases, message queues, scalability, and observability. No fluff, no vendor pitches — just the tradeoffs that actually matter.

Up next

SQL or NoSQL? Wrong Question. Here's the Right One.

Series: Backend Engineering Fundamentals · Post 04 of 07 Level: Intermediate · Read time: ~9 min Every few years the industry declares SQL dead, or NoSQL dead, or NewSQL the future. Meanwhile, produ