When to Stop Calling APIs and Start Publishing Events

Series: Backend Engineering Fundamentals · Post 05 of 07 Level: Advanced · Read time: ~10 min

Picture a simple checkout flow: user places an order → charge the card → update inventory → send a confirmation email → notify the warehouse → update analytics.

In a synchronous world, your checkout endpoint calls each of those services in sequence. If the email service is slow, the checkout is slow. If the warehouse notification times out, do you roll back the charge? If analytics is down, does checkout fail?

Synchronous chains are brittle. They couple your system's availability to the availability of every downstream service. At small scale, this is manageable. At scale, it becomes the source of cascading failures, long tail latencies, and 3am incidents.

Message queues and event streaming are how you break these chains.

The Core Idea: Decouple Producers from Consumers

Instead of Service A calling Service B directly, A publishes an event to a queue or topic. B (and C, and D) subscribe and process that event independently, at their own pace.

❌ Synchronous — Tightly Coupled

OrderService → [HTTP] → PaymentService → [HTTP] → EmailService → [HTTP] → WarehouseService
  (if any step fails, the whole chain fails)


✅ Event-Driven — Loosely Coupled

OrderService → [Publish: order.placed] → Message Broker
                                              ↓
                              ┌───────────────┼────────────────┐
                              ↓               ↓                ↓
                        PaymentService   EmailService   WarehouseService
                    (processes when     (processes      (processes when
                       ready)           independently)    ready)

This shift — from calling to publishing — fundamentally changes how your system scales and fails.

Message Queues vs Event Streaming

These are related but distinct concepts. Getting the distinction right matters for choosing the right tool.

	Message Queue	Event Stream
Model	Work distribution — each message consumed by one consumer	Log — multiple consumers read the full stream independently
After consumption	Message is deleted	Message is retained (configurable duration)
Replay	Not supported	Supported — reprocess from any point
Ordering	Per-queue FIFO	Ordered within a partition
Best for	Task distribution, job queues	Event sourcing, audit logs, real-time pipelines
Tools	RabbitMQ, Amazon SQS, ActiveMQ	Kafka, Amazon Kinesis, Pulsar

RabbitMQ — The Message Queue Standard

RabbitMQ is a mature, AMQP-based message broker. The mental model: producers send messages to exchanges, exchanges route them to queues, consumers read from queues.

import pika

# Producer: Publishing a task
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

channel.queue_declare(queue='email_notifications', durable=True)
# durable=True: queue survives broker restart

channel.basic_publish(
    exchange='',
    routing_key='email_notifications',
    body='{"type": "order_confirmation", "orderId": "789", "userId": "123"}',
    properties=pika.BasicProperties(delivery_mode=2)  # 2 = persistent message
)

# Consumer: Processing tasks
def process_email(ch, method, properties, body):
    data = json.loads(body)
    send_confirmation_email(data['userId'], data['orderId'])
    ch.basic_ack(delivery_tag=method.delivery_tag)  # Acknowledge success

channel.basic_qos(prefetch_count=1)  # Process one message at a time
channel.basic_consume(queue='email_notifications', on_message_callback=process_email)
channel.start_consuming()

Key RabbitMQ concepts:

Acknowledgments (ack/nack): Consumer explicitly confirms it processed the message. If it crashes before acking, the message is redelivered. If it nacks, it can be requeued or sent to a dead-letter exchange.
Dead Letter Exchange (DLX): Messages that fail processing (after retry limits) are routed here. Critical for debugging and not silently dropping failures.
Exchange types: Direct (exact routing key match), Topic (wildcard routing), Fanout (broadcast to all bound queues).

# Dead Letter Queue setup
channel.queue_declare(
    queue='email_notifications',
    durable=True,
    arguments={
        'x-dead-letter-exchange': 'dlx',
        'x-message-ttl': 60000,        # Messages expire after 60s if not consumed
        'x-max-retries': 3             # Custom header for retry counting
    }
)

Apache Kafka — Event Streaming at Scale

Kafka is fundamentally different from RabbitMQ. It's a distributed log: events are appended to topics (partitioned, replicated logs), and consumers read from those logs at their own offset.

Topic: order-events (3 partitions)

Partition 0: [order.placed, order.placed, order.cancelled]
Partition 1: [order.placed, order.shipped, order.delivered]
Partition 2: [order.placed, order.paid]

Consumer Group A (Order Fulfillment): reads all partitions, tracks offset
Consumer Group B (Analytics): reads all partitions, independent offset
Consumer Group C (Fraud Detection): reads all partitions, independent offset

Each group processes the FULL stream independently.
Adding a new consumer group doesn't affect existing ones.

from confluent_kafka import Producer, Consumer

# Producer
producer = Producer({'bootstrap.servers': 'kafka:9092'})

def publish_order_event(order_id: str, event_type: str, data: dict):
    producer.produce(
        topic='order-events',
        key=order_id,          # Same key → same partition → ordered for this order
        value=json.dumps({"type": event_type, "orderId": order_id, **data}),
        callback=delivery_report
    )
    producer.flush()

# Consumer
consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'order-fulfillment-service',
    'auto.offset.reset': 'earliest'  # Start from beginning if no committed offset
})

consumer.subscribe(['order-events'])

while True:
    msg = consumer.poll(1.0)
    if msg and not msg.error():
        event = json.loads(msg.value())
        process_order_event(event)
        consumer.commit()  # Commit offset after successful processing

Kafka's superpower — replay: Because events are retained in the log, you can:

Replay events to rebuild a corrupted database
Add a new downstream service and backfill it from the beginning of time
Debug production issues by replaying the exact event sequence

Kafka vs RabbitMQ — Choosing the Right Tool

Scenario	Use
Background job processing (email, notifications, PDF generation)	RabbitMQ / SQS
Multiple services need to react to the same event independently	Kafka
You need to replay or audit events	Kafka
Simple task queue, low throughput	RabbitMQ / SQS
Real-time data pipelines, event sourcing	Kafka
You want managed, minimal ops overhead	Amazon SQS or Google Pub/Sub
Microservices with complex routing rules	RabbitMQ
>100k events/second	Kafka

💡 Amazon SQS is the "just works" option for AWS shops. No broker to manage, virtually unlimited scale, pay-per-use. For most task queue use cases, it's the practical default.

Delivery Guarantees — This Matters More Than Most Teams Realize

Not all message systems deliver the same guarantee:

Guarantee	Meaning	Risk
At-most-once	Message delivered 0 or 1 times	Messages can be lost
At-least-once	Message delivered 1 or more times	Duplicate processing possible
Exactly-once	Message delivered exactly once	Hard to guarantee end-to-end; Kafka transactions support this

Most systems use at-least-once delivery. This means your consumers must be idempotent — processing the same message twice must produce the same result as processing it once.

def process_payment(payment_id: str, amount: float):
    # ❌ NOT idempotent — charges twice if message is redelivered
    charge_card(payment_id, amount)
    
    # ✅ Idempotent — check if already processed
    if db.payment_exists(payment_id):
        return  # Already processed, safe to skip
    
    with db.transaction():
        charge_card(payment_id, amount)
        db.record_payment(payment_id, amount)

Common Patterns

Fan-Out

One event triggers multiple independent consumers:

order.placed
    ├── EmailService (send confirmation)
    ├── InventoryService (reserve stock)
    ├── AnalyticsService (track purchase)
    └── LoyaltyService (award points)

Saga Pattern — Distributed Transactions

When you need a transaction across multiple services without a distributed lock:

Choreography-based Saga:

1. OrderService publishes: order.created
2. PaymentService consumes, processes payment, publishes: payment.completed
3. InventoryService consumes, reserves stock, publishes: inventory.reserved
4. FulfillmentService consumes, ships order, publishes: order.fulfilled

On failure at step 3:
3b. InventoryService publishes: inventory.failed
4b. PaymentService consumes inventory.failed, issues refund, publishes: payment.refunded

Outbox Pattern — Reliable Event Publishing

The classic dual-write problem: how do you update the database AND publish an event atomically?

# ❌ WRONG — race condition
def place_order(order: Order):
    db.save(order)              # Succeeds
    kafka.publish(order_event)  # Fails → event never published, DB inconsistent

# ✅ CORRECT — Transactional Outbox Pattern
def place_order(order: Order):
    with db.transaction():
        db.save(order)
        db.outbox.insert({       # Write event to outbox table in same transaction
            "topic": "order-events",
            "payload": order_event_json,
            "published": False
        })
    # Separate process polls outbox and publishes to Kafka reliably

Key Takeaways

Message queues decouple services — a slow downstream service no longer blocks your upstream caller
RabbitMQ is the right choice for task distribution, complex routing, and lower-throughput workloads
Kafka is for high-throughput event streaming, replay, audit, and fan-out at scale
SQS/Pub Sub for managed simplicity with minimal operational overhead
Idempotency is mandatory with at-least-once delivery — design your consumers to handle duplicates safely
The Outbox Pattern solves reliable event publishing without distributed transactions
Don't go event-driven prematurely — if your system has 3 services, synchronous calls are probably fine

Have you dealt with a cascade failure in a synchronous service chain that made you switch to async? What was the tipping point?

Next in the series → Post 06: Scaling — Before You Buy More Servers, Read This

You've decoupled your services with events. Now: how do you scale the services themselves?

When to Stop Calling APIs and Start Publishing Events

The Core Idea: Decouple Producers from Consumers

Message Queues vs Event Streaming

RabbitMQ — The Message Queue Standard

Apache Kafka — Event Streaming at Scale

Kafka vs RabbitMQ — Choosing the Right Tool

Delivery Guarantees — This Matters More Than Most Teams Realize

Common Patterns

Fan-Out

Saga Pattern — Distributed Transactions

Outbox Pattern — Reliable Event Publishing

Key Takeaways

Comments

Backend Engineering Fundamentals

Scaling: Before You Buy More Servers, Read This

More from this blog

The Vibe Coding Trap: Why AI-Driven Development Needs Strong Architecture

You Can't Manage What You Can't See: The Three Pillars of Observability

Scaling: Before You Buy More Servers, Read This

SQL or NoSQL? Wrong Question. Here's the Right One.

Command Palette

The Core Idea: Decouple Producers from Consumers

Message Queues vs Event Streaming

RabbitMQ — The Message Queue Standard

Apache Kafka — Event Streaming at Scale

Kafka vs RabbitMQ — Choosing the Right Tool

Delivery Guarantees — This Matters More Than Most Teams Realize

Common Patterns

Fan-Out

Saga Pattern — Distributed Transactions

Outbox Pattern — Reliable Event Publishing

Key Takeaways

Comments

Backend Engineering Fundamentals

Scaling: Before You Buy More Servers, Read This

More from this blog