<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ajitabh Singh]]></title><description><![CDATA[Ajitabh Singh]]></description><link>https://ajitabh.net</link><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 06:59:36 GMT</lastBuildDate><atom:link href="https://ajitabh.net/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[You Can't Manage What You Can't See: The Three Pillars of Observability]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 07 of 07
Level: Intermediate · Read time: ~9 min


It's 2am. An alert fires. Your service is down.
You open your dashboard: CPU is fine, memory is fine.]]></description><link>https://ajitabh.net/you-cant-manage-what-you-cant-see-three-pillars-of-observability</link><guid isPermaLink="true">https://ajitabh.net/you-cant-manage-what-you-cant-see-three-pillars-of-observability</guid><category><![CDATA[observability]]></category><category><![CDATA[monitoring]]></category><category><![CDATA[#prometheus]]></category><category><![CDATA[OpenTelemetry]]></category><category><![CDATA[Backend Engineering]]></category><category><![CDATA[distributed tracing]]></category><category><![CDATA[logging]]></category><category><![CDATA[System Design]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Thu, 26 Mar 2026 17:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/d784a5ef-db44-4648-9a61-4e99d6f2744a.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 07 of 07
<strong>Level:</strong> Intermediate · <strong>Read time:</strong> ~9 min</p>
</blockquote>
<hr />
<p>It's 2am. An alert fires. Your service is down.</p>
<p>You open your dashboard: CPU is fine, memory is fine. You check the logs: thousands of lines of <code>INFO</code> messages and a few <code>ERROR</code> lines with stack traces — none of them obviously the root cause. You open your tracing tool and realize you only instrumented the main API service, not the three downstream services it calls.</p>
<p>Forty-five minutes later, you find it: a database connection pool exhausted in a service that has no alerting on it, caused by a slow query introduced in yesterday's deployment. You had no visibility into that service, no alert on the metric that would have told you, and no trace that showed where the latency was accumulating.</p>
<p>This is what poor observability looks like. You're debugging in the dark.</p>
<p>Observability isn't a feature you add later. It's the practice of making your system <strong>understandable from the outside</strong> — so that when something goes wrong, you can ask questions of your system and get useful answers.</p>
<hr />
<h2>The Three Pillars</h2>
<p>Observability is commonly structured around three signal types:</p>
<table>
<thead>
<tr>
<th>Pillar</th>
<th>Answers</th>
<th>Examples</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Logs</strong></td>
<td>What happened?</td>
<td>Error traces, audit events, debug output</td>
</tr>
<tr>
<td><strong>Metrics</strong></td>
<td>How is it behaving over time?</td>
<td>Request rate, error %, CPU, latency histograms</td>
</tr>
<tr>
<td><strong>Traces</strong></td>
<td>Where did this request go and how long did each step take?</td>
<td>Distributed request spans across services</td>
</tr>
</tbody></table>
<p>Each pillar answers different questions. All three are needed for a complete picture.</p>
<hr />
<h2>Pillar 1: Logs</h2>
<p>Logs are the most familiar observability tool — and the most frequently done wrong.</p>
<h3>Structured Logging</h3>
<p>The difference between logs that help you and logs that don't is <strong>structure</strong>.</p>
<pre><code class="language-python"># ❌ Unstructured — fast to write, painful to query at scale
print(f"Processing order {order_id} for user {user_id} failed: {error}")
# Output: "Processing order 789 for user 123 failed: Connection timeout"

# ✅ Structured — queryable, filterable, alertable
import structlog

logger = structlog.get_logger()
logger.error(
    "order_processing_failed",
    order_id=order_id,
    user_id=user_id,
    error_type="ConnectionTimeout",
    service="payment-service",
    duration_ms=3240,
    retry_count=3
)
# Output: {"event": "order_processing_failed", "order_id": "789", "user_id": "123",
#          "error_type": "ConnectionTimeout", "duration_ms": 3240, ...}
</code></pre>
<p>With structured logs, you can query: <em>show me all orders that failed with ConnectionTimeout in the last hour where retry_count &gt; 2</em>. With unstructured logs, you're writing regex.</p>
<h3>Log Levels — Use Them Correctly</h3>
<pre><code class="language-python">logger.debug(...)    # Detailed diagnostic, disabled in production
logger.info(...)     # Normal operation milestones (request received, order placed)
logger.warning(...)  # Unexpected but handled (retried 2x, using fallback)
logger.error(...)    # Failed operation, requires attention
logger.critical(...) # System cannot continue, immediate action required
</code></pre>
<p><strong>Common mistake:</strong> Using <code>INFO</code> for everything. At scale, an INFO log for every request is millions of log entries per hour — expensive to store and slow to search. Log meaningful state changes and errors, not every heartbeat.</p>
<h3>Correlation IDs</h3>
<p>In a distributed system, a single user action triggers logs across multiple services. Without a correlation ID, you can't connect them.</p>
<pre><code class="language-python">import uuid
from contextvars import ContextVar

request_id: ContextVar[str] = ContextVar('request_id', default='')

# Middleware: Generate ID at the edge
@app.middleware("http")
async def add_correlation_id(request: Request, call_next):
    req_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))
    request_id.set(req_id)
    response = await call_next(request)
    response.headers["X-Request-ID"] = req_id
    return response

# Every log includes it automatically
logger.info("payment_initiated", request_id=request_id.get(), ...)

# Pass it downstream
httpx.post(payment_service_url, headers={"X-Request-ID": request_id.get()})
</code></pre>
<p>Now you can search your log aggregator for a single <code>request_id</code> and see every log line across every service for that user's request.</p>
<hr />
<h2>Pillar 2: Metrics</h2>
<p>Metrics are <strong>aggregated numerical measurements over time</strong>. They answer: is the system healthy right now, and how does that compare to last week?</p>
<h3>The Four Golden Signals</h3>
<p>Google's SRE book identified four metrics that, together, give you a complete picture of service health:</p>
<pre><code>1. Latency      — How long are requests taking?
                  (Distinguish: successful requests vs error requests)

2. Traffic      — How many requests per second?
                  (Understand normal baselines)

3. Errors       — What percentage of requests are failing?
                  (Both 5xx errors and application-level failures)

4. Saturation   — How "full" is the service?
                  (CPU, memory, queue depth, connection pool usage)
</code></pre>
<p><strong>If you only instrument one thing, instrument these four for every service.</strong></p>
<h3>Prometheus — The De-Facto Standard</h3>
<pre><code class="language-python">from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Counter: always increasing (requests, errors)
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'])

# Histogram: distribution of values (latency, request size)
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency',
    ['endpoint'],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]  # Bucket boundaries in seconds
)

# Gauge: current value (active connections, queue depth)
ACTIVE_CONNECTIONS = Gauge('db_active_connections', 'Active database connections')

# Instrument your endpoints
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration = time.time() - start
    
    REQUEST_COUNT.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()
    
    REQUEST_LATENCY.labels(endpoint=request.url.path).observe(duration)
    
    return response
</code></pre>
<h3>Percentiles Beat Averages</h3>
<p>Average latency hides your users' actual experience. If 95% of requests are fast but 5% are extremely slow, the average looks fine while 5% of your users are having a terrible time.</p>
<pre><code>Average latency: 120ms  ← Looks fine
P50 (median):   80ms    ← Most users are fine
P95:            450ms   ← 5% of users waiting 450ms
P99:            2,100ms ← 1% of users waiting 2+ seconds
</code></pre>
<p>Always alert on <strong>P95 and P99 latency</strong>, not averages.</p>
<hr />
<h2>Pillar 3: Distributed Tracing</h2>
<p>Logs tell you what happened. Metrics tell you how things are trending. Traces tell you <strong>where time is actually going</strong> across your services for a specific request.</p>
<pre><code>Trace for request_id: abc-123 (total: 1,240ms)

├── API Gateway                              [10ms]
│   └── OrderService.createOrder()          [1,210ms]
│       ├── validateUser() → UserService     [15ms]
│       ├── checkInventory() → InventoryService [45ms]
│       ├── processPayment() → PaymentService  [980ms]  ← HERE
│       │   ├── validateCard()               [12ms]
│       │   ├── chargeCard() → Stripe API    [952ms]  ← External call slow
│       │   └── recordTransaction() → DB     [16ms]
│       └── sendNotification() → EmailService [35ms]
</code></pre>
<p>Without tracing, you'd see "order creation is slow (1,240ms)" in your metrics. With tracing, you see "Stripe API is taking 952ms." Two very different problems to solve.</p>
<h3>OpenTelemetry — The Standard</h3>
<p>OpenTelemetry (OTel) is the vendor-neutral instrumentation standard for traces, metrics, and logs.</p>
<pre><code class="language-python">from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Setup (once at application start)
provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

# Instrument a function
def process_payment(order_id: str, amount: float):
    with tracer.start_as_current_span("process_payment") as span:
        span.set_attribute("order.id", order_id)
        span.set_attribute("payment.amount", amount)
        
        try:
            result = stripe.charge(amount)
            span.set_attribute("stripe.charge_id", result.id)
            return result
        except StripeError as e:
            span.record_exception(e)
            span.set_status(StatusCode.ERROR, str(e))
            raise
</code></pre>
<p>OTel data can be exported to Jaeger, Zipkin, Datadog, Honeycomb, Grafana Tempo — your choice of backend.</p>
<hr />
<h2>SLOs, SLAs, and Error Budgets</h2>
<p>Metrics are more useful when tied to explicit reliability targets.</p>
<ul>
<li><strong>SLA (Service Level Agreement):</strong> A <em>contract</em> with users or customers. "We guarantee 99.9% uptime." Breaking this has business consequences.</li>
<li><strong>SLO (Service Level Objective):</strong> An internal <em>target</em> for reliability. "We aim for P99 latency &lt; 500ms, measured over 30 days."</li>
<li><strong>Error Budget:</strong> The amount of unreliability you're <em>allowed</em> before you're breaking your SLO.</li>
</ul>
<pre><code>SLO: 99.9% availability over 30 days

Total minutes in 30 days: 43,200
Allowed downtime (0.1%): 43.2 minutes

Error budget: 43.2 minutes

If you've used 40 minutes this month:
→ Feature freezes, focus on reliability
→ Any risky deployments wait until next month's budget resets

If you've used 5 minutes:
→ You have headroom for risky changes, experiments
</code></pre>
<p>Error budgets create a shared language between engineering and product: reliability isn't free, it consumes budget, and you have to choose how to spend it.</p>
<hr />
<h2>Alerting — Noise Is the Enemy</h2>
<p>A team that receives 50 alerts per day learns to ignore alerts. The goal is <strong>high signal, low noise</strong>.</p>
<pre><code class="language-yaml"># ❌ Alert on symptoms, not causes — creates noise
- alert: CpuHigh
  expr: cpu_usage &gt; 80
  for: 5m
  # Root cause of 100 different things — what do you do with this?

# ✅ Alert on user-visible impact — actionable
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) &gt; 0.05
  for: 2m
  annotations:
    summary: "Error rate above 5% for 2 minutes"
    runbook: "https://wiki/runbooks/high-error-rate"

- alert: HighLatency
  expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) &gt; 2
  for: 3m
  annotations:
    summary: "P99 latency above 2 seconds"
</code></pre>
<p><strong>Runbooks matter:</strong> Every alert should link to a runbook — a documented set of steps for investigating and resolving that specific alert. Runbooks reduce MTTR (mean time to resolve) and mean the on-call engineer doesn't have to improvise at 2am.</p>
<hr />
<h2>Observability Stack — Common Combinations</h2>
<table>
<thead>
<tr>
<th>Stack</th>
<th>Use Case</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Prometheus + Grafana + Jaeger</strong></td>
<td>Open source, self-hosted, full control</td>
</tr>
<tr>
<td><strong>Datadog</strong></td>
<td>Managed, all-in-one, expensive but powerful</td>
</tr>
<tr>
<td><strong>Grafana Cloud (Loki + Tempo + Mimir)</strong></td>
<td>Open source stack, managed hosting</td>
</tr>
<tr>
<td><strong>AWS CloudWatch</strong></td>
<td>Good enough for AWS-native teams, avoid vendor lock-in</td>
</tr>
<tr>
<td><strong>Honeycomb</strong></td>
<td>Best-in-class for traces and exploratory analysis</td>
</tr>
<tr>
<td><strong>OpenTelemetry → Any backend</strong></td>
<td>Instrument once, switch backends freely</td>
</tr>
</tbody></table>
<blockquote>
<p>💡 <strong>Start with OpenTelemetry instrumentation</strong> regardless of which backend you choose. OTel is vendor-neutral — instrument your code with OTel, export to wherever makes sense today, and migrate backends later without changing application code.</p>
</blockquote>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>Logs, metrics, and traces</strong> each answer different questions — you need all three</li>
<li><strong>Structured logging</strong> (JSON, key-value) makes logs queryable; unstructured logs don't scale</li>
<li><strong>Correlation IDs</strong> connect a user action across every service in your system</li>
<li><strong>The four golden signals</strong> (latency, traffic, errors, saturation) are the minimum metrics for every service</li>
<li><strong>Alert on P95/P99 latency</strong>, not averages — averages hide the tail experience</li>
<li><strong>Distributed tracing</strong> shows you where time actually goes across service boundaries</li>
<li><strong>OpenTelemetry</strong> is the vendor-neutral standard — instrument with it, export anywhere</li>
<li><strong>SLOs and error budgets</strong> give reliability a shared language across engineering and product</li>
<li><strong>Alert on user-visible symptoms</strong> — too many alerts = all alerts get ignored</li>
</ul>
<hr />
<p><strong>What's the metric or log line you wish you'd added <em>before</em> your first major incident? What would have cut your MTTR in half?</strong></p>
<hr />
<h2>Wrapping Up the Series</h2>
<p>This was Post 7 of 7 in <strong>Backend Engineering Fundamentals</strong>. Here's where we've been:</p>
<ol>
<li><strong>APIs</strong> — Choosing the right communication paradigm</li>
<li><strong>Caching</strong> — What to cache, how to invalidate, what can go wrong</li>
<li><strong>Security</strong> — Auth patterns and the vulnerabilities that actually cause breaches</li>
<li><strong>Databases</strong> — Access patterns, CAP theorem, when to use what</li>
<li><strong>Message Queues</strong> — Decoupling services with events</li>
<li><strong>Scalability</strong> — Scaling strategies before and after you need them</li>
<li><strong>Observability</strong> — Making your system understandable from the outside <em>(you are here)</em></li>
</ol>
<hr />
<p><em>If this series was useful, share it with your team or anyone who'd benefit. And if there's a topic you'd like covered next — drop it in the comments.</em></p>
]]></content:encoded></item><item><title><![CDATA[Scaling: Before You Buy More Servers, Read This]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 06 of 07
Level: Beginner-friendly · Read time: ~8 min


"We need to scale" is one of the most expensive sentences in engineering.
It triggers infrastruc]]></description><link>https://ajitabh.net/scaling-before-you-buy-more-servers-read-this</link><guid isPermaLink="true">https://ajitabh.net/scaling-before-you-buy-more-servers-read-this</guid><category><![CDATA[scalability]]></category><category><![CDATA[Load Balancing]]></category><category><![CDATA[Backend Engineering]]></category><category><![CDATA[System Design]]></category><category><![CDATA[Redis]]></category><category><![CDATA[horizontal scaling]]></category><category><![CDATA[auto scaling]]></category><category><![CDATA[PostgreSQL]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Thu, 26 Mar 2026 16:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/70c7141b-81ca-4b2a-bb57-96fa13f09005.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 06 of 07
<strong>Level:</strong> Beginner-friendly · <strong>Read time:</strong> ~8 min</p>
</blockquote>
<hr />
<p>"We need to scale" is one of the most expensive sentences in engineering.</p>
<p>It triggers infrastructure discussions, migration projects, and architectural rewrites — often before anyone has looked at whether the current system is actually running at capacity.</p>
<p>Before scaling your infrastructure, understand what you're actually scaling <em>for</em>. Most systems that feel slow are bottlenecked by code problems (N+1 queries, missing indexes, synchronous calls that should be async) — not infrastructure capacity. Scaling a slow system gives you a more expensive slow system.</p>
<p>This post covers the actual mechanics of scaling, the tradeoffs between approaches, and how to think about it before opening a cloud console.</p>
<hr />
<h2>Vertical vs Horizontal Scaling</h2>
<p><strong>Vertical scaling (Scale Up):</strong> Add more resources to existing servers — bigger CPU, more RAM, faster disk.</p>
<p><strong>Horizontal scaling (Scale Out):</strong> Add more servers and distribute the load across them.</p>
<pre><code>Vertical Scaling                    Horizontal Scaling

[Server: 8 CPU, 32GB]      →       [Server: 4 CPU, 16GB] ×3
         ↓                                   ↓
[Server: 32 CPU, 128GB]             [Server: 4 CPU, 16GB] ×10
(one big machine)                   (many smaller machines)
</code></pre>
<table>
<thead>
<tr>
<th></th>
<th>Vertical</th>
<th>Horizontal</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Simplicity</strong></td>
<td>Simple — no code changes</td>
<td>Complex — requires stateless design</td>
</tr>
<tr>
<td><strong>Cost</strong></td>
<td>Expensive at high end (premium hardware)</td>
<td>Cheaper per unit at scale</td>
</tr>
<tr>
<td><strong>Failure impact</strong></td>
<td>Single point of failure</td>
<td>Redundant — one server failure is minor</td>
</tr>
<tr>
<td><strong>Ceiling</strong></td>
<td>Hard limit on available hardware</td>
<td>Theoretically unlimited</td>
</tr>
<tr>
<td><strong>Database</strong></td>
<td>Works well (most DBs scale vertically first)</td>
<td>Sharding required for DBs</td>
</tr>
</tbody></table>
<p><strong>In practice:</strong> Start with vertical scaling. It's simpler, faster, and often sufficient. Switch to horizontal when you hit the vertical ceiling or need high availability.</p>
<hr />
<h2>The Stateless Requirement for Horizontal Scaling</h2>
<p>Horizontal scaling only works if your application is <strong>stateless</strong> — each request can be handled by any server, with no local state that makes one server "special."</p>
<pre><code>❌ Stateful — Can't Scale Horizontally

Server 1: User session in memory → [Request for user A] works
Server 2: No session for user A  → [Request for user A] fails

✅ Stateless — Scales Horizontally

Server 1: No local state → reads session from Redis
Server 2: No local state → reads session from Redis
Server 3: No local state → reads session from Redis

Any server can handle any request.
Load balancer distributes freely.
</code></pre>
<p><strong>The rule:</strong> Move all state out of your application servers and into shared storage (Redis for sessions, S3 for files, your database for persistent data). Your servers should be interchangeable.</p>
<pre><code class="language-python"># ❌ Stateful — in-memory session
app.sessions[user_id] = {"cart": items}  # Lives on one server only

# ✅ Stateless — session in Redis
redis.setex(f"session:{session_id}", 3600, json.dumps({"cart": items}))
</code></pre>
<hr />
<h2>Load Balancers — The Front Door to Your Scaled System</h2>
<p>A load balancer distributes incoming requests across your pool of servers.</p>
<pre><code>Internet
   ↓
[Load Balancer]
   ├── Server 1
   ├── Server 2
   └── Server 3
</code></pre>
<p><strong>Load balancing algorithms:</strong></p>
<table>
<thead>
<tr>
<th>Algorithm</th>
<th>How it works</th>
<th>Use when</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Round Robin</strong></td>
<td>Requests distributed in sequence (1→2→3→1→2→3)</td>
<td>Servers have equal capacity and similar request costs</td>
</tr>
<tr>
<td><strong>Least Connections</strong></td>
<td>Routes to server with fewest active connections</td>
<td>Requests have variable processing time</td>
</tr>
<tr>
<td><strong>IP Hash</strong></td>
<td>Routes same client IP to same server</td>
<td>You need session stickiness and can't use a shared session store</td>
</tr>
<tr>
<td><strong>Weighted</strong></td>
<td>Servers get traffic proportional to weight</td>
<td>Servers have different capacities</td>
</tr>
<tr>
<td><strong>Random</strong></td>
<td>Random server selection</td>
<td>Surprisingly effective at scale; simple to implement</td>
</tr>
</tbody></table>
<p><strong>Layer 4 vs Layer 7:</strong></p>
<ul>
<li><strong>L4 (TCP/UDP):</strong> Routes based on IP address and port. Extremely fast, no content inspection. AWS NLB, HAProxy in TCP mode.</li>
<li><strong>L7 (HTTP):</strong> Routes based on HTTP content (URL, headers, cookies). More flexible — route <code>/api</code> to one pool, <code>/static</code> to another. AWS ALB, NGINX, Traefik.</li>
</ul>
<pre><code class="language-nginx"># NGINX: Layer 7 load balancing with upstream pools
upstream api_servers {
    least_conn;  # Least connections algorithm
    server app1.internal:8080 weight=3;
    server app2.internal:8080 weight=3;
    server app3.internal:8080 weight=1;  # Lower weight = less traffic
    
    keepalive 32;  # Connection pool to upstream servers
}

upstream static_servers {
    server static1.internal:8080;
    server static2.internal:8080;
}

server {
    location /api/ {
        proxy_pass http://api_servers;
    }
    location /static/ {
        proxy_pass http://static_servers;
    }
}
</code></pre>
<hr />
<h2>Database Scaling — Where It Gets Hard</h2>
<p>Application servers are stateless and easy to scale. Databases are stateful and hard.</p>
<h3>Read Replicas — The First Move</h3>
<p>Most applications are read-heavy. Add read replicas and route SELECT queries there.</p>
<pre><code>Primary DB (writes)
    ↓ replication
Replica 1 (reads)
Replica 2 (reads)
Replica 3 (reads)

Application:
  - INSERT / UPDATE / DELETE → Primary
  - SELECT → Random replica
</code></pre>
<pre><code class="language-python"># Connection routing example
def get_db_connection(read_only: bool = False):
    if read_only:
        return random.choice(replica_connections)
    return primary_connection
</code></pre>
<p><strong>Limitation:</strong> Replication lag. Replicas are slightly behind the primary (usually milliseconds, but can grow under load). Don't read from a replica immediately after a write if you need the result.</p>
<hr />
<h3>Connection Pooling — Before You Add Replicas</h3>
<p>Before adding replicas, make sure you're not wasting connections. Databases have a hard limit on concurrent connections. Without pooling, a spike in traffic can exhaust connections instantly.</p>
<pre><code class="language-python"># SQLAlchemy connection pool
engine = create_engine(
    DATABASE_URL,
    pool_size=20,          # Normal pool size
    max_overflow=30,       # Extra connections under load
    pool_timeout=30,       # Wait up to 30s for a connection before error
    pool_recycle=3600      # Recycle connections after 1 hour
)
</code></pre>
<p>For PostgreSQL at scale, use <strong>PgBouncer</strong> — a lightweight connection pooler that sits between your app and the database, multiplexing thousands of application connections onto a smaller number of actual DB connections.</p>
<hr />
<h3>Sharding — The Last Resort</h3>
<p>When a single primary + replicas isn't enough, you shard: split your data across multiple databases.</p>
<pre><code>User IDs 1–1M   → Database Shard 1
User IDs 1M–2M  → Database Shard 2
User IDs 2M–3M  → Database Shard 3
</code></pre>
<p><strong>The costs are real:</strong></p>
<ul>
<li>Cross-shard queries (JOINs across shards) become application logic</li>
<li>Transactions across shards require distributed transaction handling</li>
<li>Resharding (when a shard gets too large) is painful</li>
<li>Every query needs shard-routing logic</li>
</ul>
<p>Sharding adds enormous operational complexity. Exhaust all other options first: indexing, query optimization, read replicas, caching, connection pooling, vertical scaling.</p>
<hr />
<h2>Auto-Scaling — Elasticity, Not Magic</h2>
<p>Auto-scaling adds or removes servers based on load. This is valuable for variable traffic patterns (traffic spikes on product launches, Black Friday, etc.).</p>
<pre><code class="language-yaml"># AWS Auto Scaling Group (simplified)
AutoScalingGroup:
  MinSize: 2          # Always at least 2 servers
  MaxSize: 20         # Never exceed 20 servers
  DesiredCapacity: 4  # Start with 4

ScalingPolicy:
  ScaleOut:
    Trigger: CPUUtilization &gt; 70% for 2 minutes
    Action: Add 2 instances
  ScaleIn:
    Trigger: CPUUtilization &lt; 30% for 10 minutes
    Action: Remove 1 instance
</code></pre>
<p><strong>Auto-scaling pitfalls:</strong></p>
<ol>
<li><p><strong>Cold start time:</strong> If spinning up a new instance takes 3 minutes, it won't help with a traffic spike that peaks in 1 minute. Pre-warm with a higher minimum capacity.</p>
</li>
<li><p><strong>Scale-in aggressiveness:</strong> Removing servers too aggressively causes thrashing (scale up, scale down, scale up again). Add a cooldown period.</p>
</li>
<li><p><strong>Database doesn't scale automatically:</strong> Auto-scaling your app tier is useless if your database becomes the bottleneck. Ensure your DB can handle the connection surge from new instances.</p>
</li>
<li><p><strong>Stateful sessions:</strong> If you forgot the stateless requirement, auto-scaling will cause session loss when a server is removed.</p>
</li>
</ol>
<hr />
<h2>CDN for Static Assets — The Easiest Win</h2>
<p>Before spending time on application scaling, ask: how much of your traffic is serving static files (JS, CSS, images)?</p>
<p>A CDN serves these from edge locations close to users, eliminating the load from your application servers entirely.</p>
<pre><code>Without CDN:
User (Tokyo) → [Internet] → App Server (US East) → serve image (300ms)

With CDN:
User (Tokyo) → CDN Edge (Tokyo) → serve cached image (8ms)
</code></pre>
<p>This also reduces bandwidth costs, since CDN egress is typically cheaper than cloud server egress.</p>
<p><strong>What to cache on CDN:</strong></p>
<ul>
<li>All static assets with content-hash filenames (infinite TTL, cache-busted on deploy)</li>
<li>API responses that are public and change infrequently (product catalog, pricing)</li>
<li>Rendered HTML pages for anonymous users (massive scale lever for content sites)</li>
</ul>
<hr />
<h2>Scaling Checklist — Before Adding Servers</h2>
<p>Run through this before any infrastructure change:</p>
<ul>
<li> Are queries using indexes? (<code>EXPLAIN ANALYZE</code> your slow queries)</li>
<li> Is there N+1 query behavior in the application?</li>
<li> Is connection pooling configured? (PgBouncer, HikariCP, SQLAlchemy pool)</li>
<li> Are static assets served via CDN?</li>
<li> Is read traffic separated to replicas?</li>
<li> Are expensive computations cached?</li>
<li> Are long-running operations async (queues) instead of blocking request threads?</li>
<li> Is the application stateless (sessions in Redis, files in S3)?</li>
</ul>
<p>Tick all of these before scaling horizontally. You'll likely find the bottleneck isn't what you thought.</p>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>Scale vertically first</strong> — it's simpler and often enough</li>
<li><strong>Stateless design is the prerequisite</strong> for horizontal scaling — move all state to shared storage</li>
<li><strong>Load balancers</strong> distribute traffic; Layer 7 gives you routing flexibility</li>
<li><strong>Read replicas</strong> are the first database scaling move — they solve most read-heavy bottlenecks</li>
<li><strong>Connection pooling</strong> (PgBouncer) often eliminates "database can't scale" problems cheaply</li>
<li><strong>Sharding is a last resort</strong> — the complexity cost is real</li>
<li><strong>CDN and query optimization</strong> have better ROI than new servers in most systems</li>
<li>Profile first. Most slow systems are code problems, not infrastructure problems.</li>
</ul>
<hr />
<p><strong>What bottleneck surprised you most when your system first started struggling under load — was it what you expected?</strong></p>
<hr />
<p><em>Next in the series → <strong>Post 07: You Can't Manage What You Can't See — The Three Pillars of Observability</strong></em></p>
<p><em>You've built and scaled your system. Now: how do you know it's working?</em></p>
]]></content:encoded></item><item><title><![CDATA[When to Stop Calling APIs and Start Publishing Events]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 05 of 07
Level: Advanced · Read time: ~10 min


Picture a simple checkout flow: user places an order → charge the card → update inventory → send a confi]]></description><link>https://ajitabh.net/when-to-stop-calling-apis-and-start-publishing-events</link><guid isPermaLink="true">https://ajitabh.net/when-to-stop-calling-apis-and-start-publishing-events</guid><category><![CDATA[kafka]]></category><category><![CDATA[message queue]]></category><category><![CDATA[rabbitmq]]></category><category><![CDATA[event-driven-architecture]]></category><category><![CDATA[Backend Engineering]]></category><category><![CDATA[System Design]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Thu, 26 Mar 2026 15:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/ff10ffe0-8ae5-4cde-9269-b043d805dd27.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 05 of 07
<strong>Level:</strong> Advanced · <strong>Read time:</strong> ~10 min</p>
</blockquote>
<hr />
<p>Picture a simple checkout flow: user places an order → charge the card → update inventory → send a confirmation email → notify the warehouse → update analytics.</p>
<p>In a synchronous world, your checkout endpoint calls each of those services in sequence. If the email service is slow, the checkout is slow. If the warehouse notification times out, do you roll back the charge? If analytics is down, does checkout fail?</p>
<p>Synchronous chains are brittle. They couple your system's availability to the availability of every downstream service. At small scale, this is manageable. At scale, it becomes the source of cascading failures, long tail latencies, and 3am incidents.</p>
<p>Message queues and event streaming are how you break these chains.</p>
<hr />
<h2>The Core Idea: Decouple Producers from Consumers</h2>
<p>Instead of Service A calling Service B directly, A <strong>publishes an event</strong> to a queue or topic. B (and C, and D) <strong>subscribe</strong> and process that event independently, at their own pace.</p>
<pre><code>❌ Synchronous — Tightly Coupled

OrderService → [HTTP] → PaymentService → [HTTP] → EmailService → [HTTP] → WarehouseService
  (if any step fails, the whole chain fails)


✅ Event-Driven — Loosely Coupled

OrderService → [Publish: order.placed] → Message Broker
                                              ↓
                              ┌───────────────┼────────────────┐
                              ↓               ↓                ↓
                        PaymentService   EmailService   WarehouseService
                    (processes when     (processes      (processes when
                       ready)           independently)    ready)
</code></pre>
<p>This shift — from calling to publishing — fundamentally changes how your system scales and fails.</p>
<hr />
<h2>Message Queues vs Event Streaming</h2>
<p>These are related but distinct concepts. Getting the distinction right matters for choosing the right tool.</p>
<table>
<thead>
<tr>
<th></th>
<th>Message Queue</th>
<th>Event Stream</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Model</strong></td>
<td>Work distribution — each message consumed by one consumer</td>
<td>Log — multiple consumers read the full stream independently</td>
</tr>
<tr>
<td><strong>After consumption</strong></td>
<td>Message is deleted</td>
<td>Message is retained (configurable duration)</td>
</tr>
<tr>
<td><strong>Replay</strong></td>
<td>Not supported</td>
<td>Supported — reprocess from any point</td>
</tr>
<tr>
<td><strong>Ordering</strong></td>
<td>Per-queue FIFO</td>
<td>Ordered within a partition</td>
</tr>
<tr>
<td><strong>Best for</strong></td>
<td>Task distribution, job queues</td>
<td>Event sourcing, audit logs, real-time pipelines</td>
</tr>
<tr>
<td><strong>Tools</strong></td>
<td>RabbitMQ, Amazon SQS, ActiveMQ</td>
<td>Kafka, Amazon Kinesis, Pulsar</td>
</tr>
</tbody></table>
<hr />
<h2>RabbitMQ — The Message Queue Standard</h2>
<p>RabbitMQ is a mature, AMQP-based message broker. The mental model: producers send messages to <strong>exchanges</strong>, exchanges route them to <strong>queues</strong>, consumers read from queues.</p>
<pre><code class="language-python">import pika

# Producer: Publishing a task
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

channel.queue_declare(queue='email_notifications', durable=True)
# durable=True: queue survives broker restart

channel.basic_publish(
    exchange='',
    routing_key='email_notifications',
    body='{"type": "order_confirmation", "orderId": "789", "userId": "123"}',
    properties=pika.BasicProperties(delivery_mode=2)  # 2 = persistent message
)

# Consumer: Processing tasks
def process_email(ch, method, properties, body):
    data = json.loads(body)
    send_confirmation_email(data['userId'], data['orderId'])
    ch.basic_ack(delivery_tag=method.delivery_tag)  # Acknowledge success

channel.basic_qos(prefetch_count=1)  # Process one message at a time
channel.basic_consume(queue='email_notifications', on_message_callback=process_email)
channel.start_consuming()
</code></pre>
<p><strong>Key RabbitMQ concepts:</strong></p>
<ul>
<li><strong>Acknowledgments (ack/nack):</strong> Consumer explicitly confirms it processed the message. If it crashes before acking, the message is redelivered. If it nacks, it can be requeued or sent to a dead-letter exchange.</li>
<li><strong>Dead Letter Exchange (DLX):</strong> Messages that fail processing (after retry limits) are routed here. Critical for debugging and not silently dropping failures.</li>
<li><strong>Exchange types:</strong> Direct (exact routing key match), Topic (wildcard routing), Fanout (broadcast to all bound queues).</li>
</ul>
<pre><code class="language-python"># Dead Letter Queue setup
channel.queue_declare(
    queue='email_notifications',
    durable=True,
    arguments={
        'x-dead-letter-exchange': 'dlx',
        'x-message-ttl': 60000,        # Messages expire after 60s if not consumed
        'x-max-retries': 3             # Custom header for retry counting
    }
)
</code></pre>
<hr />
<h2>Apache Kafka — Event Streaming at Scale</h2>
<p>Kafka is fundamentally different from RabbitMQ. It's a <strong>distributed log</strong>: events are appended to topics (partitioned, replicated logs), and consumers read from those logs at their own offset.</p>
<pre><code>Topic: order-events (3 partitions)

Partition 0: [order.placed, order.placed, order.cancelled]
Partition 1: [order.placed, order.shipped, order.delivered]
Partition 2: [order.placed, order.paid]

Consumer Group A (Order Fulfillment): reads all partitions, tracks offset
Consumer Group B (Analytics): reads all partitions, independent offset
Consumer Group C (Fraud Detection): reads all partitions, independent offset

Each group processes the FULL stream independently.
Adding a new consumer group doesn't affect existing ones.
</code></pre>
<pre><code class="language-python">from confluent_kafka import Producer, Consumer

# Producer
producer = Producer({'bootstrap.servers': 'kafka:9092'})

def publish_order_event(order_id: str, event_type: str, data: dict):
    producer.produce(
        topic='order-events',
        key=order_id,          # Same key → same partition → ordered for this order
        value=json.dumps({"type": event_type, "orderId": order_id, **data}),
        callback=delivery_report
    )
    producer.flush()

# Consumer
consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'order-fulfillment-service',
    'auto.offset.reset': 'earliest'  # Start from beginning if no committed offset
})

consumer.subscribe(['order-events'])

while True:
    msg = consumer.poll(1.0)
    if msg and not msg.error():
        event = json.loads(msg.value())
        process_order_event(event)
        consumer.commit()  # Commit offset after successful processing
</code></pre>
<p><strong>Kafka's superpower — replay:</strong> Because events are retained in the log, you can:</p>
<ul>
<li>Replay events to rebuild a corrupted database</li>
<li>Add a new downstream service and backfill it from the beginning of time</li>
<li>Debug production issues by replaying the exact event sequence</li>
</ul>
<hr />
<h2>Kafka vs RabbitMQ — Choosing the Right Tool</h2>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Use</th>
</tr>
</thead>
<tbody><tr>
<td>Background job processing (email, notifications, PDF generation)</td>
<td><strong>RabbitMQ / SQS</strong></td>
</tr>
<tr>
<td>Multiple services need to react to the same event independently</td>
<td><strong>Kafka</strong></td>
</tr>
<tr>
<td>You need to replay or audit events</td>
<td><strong>Kafka</strong></td>
</tr>
<tr>
<td>Simple task queue, low throughput</td>
<td><strong>RabbitMQ / SQS</strong></td>
</tr>
<tr>
<td>Real-time data pipelines, event sourcing</td>
<td><strong>Kafka</strong></td>
</tr>
<tr>
<td>You want managed, minimal ops overhead</td>
<td><strong>Amazon SQS</strong> or <strong>Google Pub/Sub</strong></td>
</tr>
<tr>
<td>Microservices with complex routing rules</td>
<td><strong>RabbitMQ</strong></td>
</tr>
<tr>
<td>&gt;100k events/second</td>
<td><strong>Kafka</strong></td>
</tr>
</tbody></table>
<blockquote>
<p>💡 <strong>Amazon SQS</strong> is the "just works" option for AWS shops. No broker to manage, virtually unlimited scale, pay-per-use. For most task queue use cases, it's the practical default.</p>
</blockquote>
<hr />
<h2>Delivery Guarantees — This Matters More Than Most Teams Realize</h2>
<p>Not all message systems deliver the same guarantee:</p>
<table>
<thead>
<tr>
<th>Guarantee</th>
<th>Meaning</th>
<th>Risk</th>
</tr>
</thead>
<tbody><tr>
<td><strong>At-most-once</strong></td>
<td>Message delivered 0 or 1 times</td>
<td>Messages can be lost</td>
</tr>
<tr>
<td><strong>At-least-once</strong></td>
<td>Message delivered 1 or more times</td>
<td>Duplicate processing possible</td>
</tr>
<tr>
<td><strong>Exactly-once</strong></td>
<td>Message delivered exactly once</td>
<td>Hard to guarantee end-to-end; Kafka transactions support this</td>
</tr>
</tbody></table>
<p><strong>Most systems use at-least-once delivery.</strong> This means your consumers must be <strong>idempotent</strong> — processing the same message twice must produce the same result as processing it once.</p>
<pre><code class="language-python">def process_payment(payment_id: str, amount: float):
    # ❌ NOT idempotent — charges twice if message is redelivered
    charge_card(payment_id, amount)
    
    # ✅ Idempotent — check if already processed
    if db.payment_exists(payment_id):
        return  # Already processed, safe to skip
    
    with db.transaction():
        charge_card(payment_id, amount)
        db.record_payment(payment_id, amount)
</code></pre>
<hr />
<h2>Common Patterns</h2>
<h3>Fan-Out</h3>
<p>One event triggers multiple independent consumers:</p>
<pre><code>order.placed
    ├── EmailService (send confirmation)
    ├── InventoryService (reserve stock)
    ├── AnalyticsService (track purchase)
    └── LoyaltyService (award points)
</code></pre>
<h3>Saga Pattern — Distributed Transactions</h3>
<p>When you need a transaction across multiple services without a distributed lock:</p>
<pre><code>Choreography-based Saga:

1. OrderService publishes: order.created
2. PaymentService consumes, processes payment, publishes: payment.completed
3. InventoryService consumes, reserves stock, publishes: inventory.reserved
4. FulfillmentService consumes, ships order, publishes: order.fulfilled

On failure at step 3:
3b. InventoryService publishes: inventory.failed
4b. PaymentService consumes inventory.failed, issues refund, publishes: payment.refunded
</code></pre>
<h3>Outbox Pattern — Reliable Event Publishing</h3>
<p>The classic dual-write problem: how do you update the database AND publish an event atomically?</p>
<pre><code class="language-python"># ❌ WRONG — race condition
def place_order(order: Order):
    db.save(order)              # Succeeds
    kafka.publish(order_event)  # Fails → event never published, DB inconsistent

# ✅ CORRECT — Transactional Outbox Pattern
def place_order(order: Order):
    with db.transaction():
        db.save(order)
        db.outbox.insert({       # Write event to outbox table in same transaction
            "topic": "order-events",
            "payload": order_event_json,
            "published": False
        })
    # Separate process polls outbox and publishes to Kafka reliably
</code></pre>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>Message queues decouple services</strong> — a slow downstream service no longer blocks your upstream caller</li>
<li><strong>RabbitMQ</strong> is the right choice for task distribution, complex routing, and lower-throughput workloads</li>
<li><strong>Kafka</strong> is for high-throughput event streaming, replay, audit, and fan-out at scale</li>
<li><strong>SQS/Pub Sub</strong> for managed simplicity with minimal operational overhead</li>
<li><strong>Idempotency is mandatory</strong> with at-least-once delivery — design your consumers to handle duplicates safely</li>
<li><strong>The Outbox Pattern</strong> solves reliable event publishing without distributed transactions</li>
<li>Don't go event-driven prematurely — if your system has 3 services, synchronous calls are probably fine</li>
</ul>
<hr />
<p><strong>Have you dealt with a cascade failure in a synchronous service chain that made you switch to async? What was the tipping point?</strong></p>
<hr />
<p><em>Next in the series → <strong>Post 06: Scaling — Before You Buy More Servers, Read This</strong></em></p>
<p><em>You've decoupled your services with events. Now: how do you scale the services themselves?</em></p>
]]></content:encoded></item><item><title><![CDATA[SQL or NoSQL? Wrong Question. Here's the Right One.]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 04 of 07
Level: Intermediate · Read time: ~9 min


Every few years the industry declares SQL dead, or NoSQL dead, or NewSQL the future. Meanwhile, produ]]></description><link>https://ajitabh.net/sql-or-nosql-wrong-question-heres-the-right-one</link><guid isPermaLink="true">https://ajitabh.net/sql-or-nosql-wrong-question-heres-the-right-one</guid><category><![CDATA[Databases]]></category><category><![CDATA[PostgreSQL]]></category><category><![CDATA[MongoDB]]></category><category><![CDATA[NoSQL]]></category><category><![CDATA[SQL]]></category><category><![CDATA[Backend Engineering]]></category><category><![CDATA[backend developments]]></category><category><![CDATA[System Design]]></category><category><![CDATA[CAP-Theorem]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Thu, 26 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/f7c7e59e-78de-4fc8-a19c-a0d57b889196.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 04 of 07
<strong>Level:</strong> Intermediate · <strong>Read time:</strong> ~9 min</p>
</blockquote>
<hr />
<p>Every few years the industry declares SQL dead, or NoSQL dead, or NewSQL the future. Meanwhile, production systems quietly keep running on PostgreSQL, with a Redis cache, a MongoDB collection for one specific use case, and an Elasticsearch index for search.</p>
<p>The SQL vs NoSQL debate is the wrong frame. The right question is: <strong>what are your data access patterns, consistency requirements, and team capabilities?</strong></p>
<p>Answer those, and the database choice usually becomes obvious.</p>
<hr />
<h2>What SQL Actually Gives You (That's Often Taken for Granted)</h2>
<p>Relational databases aren't just "tables with foreign keys." The guarantees they provide are hard to replicate:</p>
<p><strong>ACID Transactions</strong></p>
<pre><code class="language-sql">BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 'alice';
  UPDATE accounts SET balance = balance + 500 WHERE id = 'bob';
COMMIT;
-- Either both updates happen, or neither does. No partial state.
</code></pre>
<p>You don't appreciate ACID until you've debugged a distributed system where you transferred $500, debited Alice, and then the network failed before crediting Bob.</p>
<p><strong>Joins — Relationship Integrity Without Application Logic</strong></p>
<pre><code class="language-sql">SELECT o.id, o.total, u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
  AND o.created_at &gt; NOW() - INTERVAL '24 hours';
</code></pre>
<p>In a document database, this query becomes application code — multiple fetches, assembled in memory, with no guarantee of consistency.</p>
<p><strong>Schema Enforcement</strong>
The database rejects data that doesn't fit the schema. This feels restrictive early in development; it becomes invaluable when your system is running 24/7 and a bug tries to write malformed data.</p>
<hr />
<h2>The CAP Theorem — A Useful Mental Model</h2>
<p>Distributed systems can guarantee at most two of three properties:</p>
<pre><code>        Consistency
       (every read returns
        the latest write)
            /\
           /  \
          /    \
         /  CP  \
        /        \
       /----AP----|
      /            \
Availability    Partition
(every request   Tolerance
gets a response) (system works
                 despite network
                   failures)
</code></pre>
<p><strong>CP systems</strong> (Consistency + Partition Tolerance): Choose correctness over availability. HBase, MongoDB (with certain write concerns), etcd.</p>
<p><strong>AP systems</strong> (Availability + Partition Tolerance): Choose availability over strict consistency. Cassandra, CouchDB, DynamoDB (by default).</p>
<p><strong>CA systems</strong>: Only possible without network partitions — i.e., single-node systems or systems within a trusted network. Most traditional relational databases in non-distributed setups.</p>
<blockquote>
<p>⚠️ In practice, network partitions always <em>can</em> happen. The real choice is between <strong>consistency and availability</strong> when a partition occurs. Choose based on your domain: banking needs consistency; social media can tolerate eventual consistency.</p>
</blockquote>
<hr />
<h2>NoSQL Data Models — Picking the Right Tool</h2>
<p>"NoSQL" is not one thing. There are four fundamentally different data models:</p>
<h3>1. Document Stores (MongoDB, Firestore, CouchDB)</h3>
<p>Store data as JSON/BSON documents. Schema is flexible per document.</p>
<pre><code class="language-json">{
  "_id": "order_789",
  "userId": "user_123",
  "status": "shipped",
  "items": [
    {"productId": "prod_45", "name": "Keyboard", "qty": 1, "price": 79.99},
    {"productId": "prod_46", "name": "Mouse", "qty": 2, "price": 29.99}
  ],
  "shippingAddress": {
    "street": "123 Main St",
    "city": "New York"
  }
}
</code></pre>
<p><strong>Use when:</strong> Your data naturally fits a hierarchical, self-contained document. The order example above is a perfect fit — you almost always want the full order with its items, not a joined result.</p>
<p><strong>Avoid when:</strong> You need to query across relationships frequently, or your schema is highly relational.</p>
<hr />
<h3>2. Key-Value Stores (Redis, DynamoDB, Riak)</h3>
<p>The simplest model: a key maps to a value. Lightning-fast lookups.</p>
<pre><code class="language-python"># Redis: O(1) lookup by key
redis.set("session:abc123", json.dumps({"userId": "123", "role": "admin"}), ex=3600)
session = redis.get("session:abc123")

# DynamoDB: partition key + optional sort key
table.get_item(Key={"userId": "123", "orderId": "order_789"})
</code></pre>
<p><strong>Use when:</strong> You need ultra-fast single-key lookups, session storage, caching, or counters.</p>
<p><strong>Avoid when:</strong> You need complex queries, filtering, or joins.</p>
<hr />
<h3>3. Column-Family Stores (Cassandra, HBase, ScyllaDB)</h3>
<p>Data is stored in column families, optimized for time-series, write-heavy workloads.</p>
<pre><code class="language-sql">-- Cassandra: Schema designed around query patterns, not data normalization
CREATE TABLE sensor_readings (
  device_id UUID,
  timestamp TIMESTAMP,
  temperature FLOAT,
  humidity FLOAT,
  PRIMARY KEY (device_id, timestamp)  -- Partition by device, sort by time
) WITH CLUSTERING ORDER BY (timestamp DESC);

-- This query is O(1) — it maps directly to the storage layout
SELECT * FROM sensor_readings WHERE device_id = ? LIMIT 100;
</code></pre>
<p><strong>Use when:</strong> You have massive write volumes, time-series data, or IoT workloads. Cassandra can handle millions of writes per second.</p>
<p><strong>Avoid when:</strong> You need complex queries that don't match your partition key, or ACID transactions.</p>
<hr />
<h3>4. Graph Databases (Neo4j, Amazon Neptune)</h3>
<p>Data is modeled as nodes and edges. Relationships are first-class citizens.</p>
<pre><code class="language-cypher">-- Neo4j: Find all friends of Alice who also like "Distributed Systems"
MATCH (alice:User {name: "Alice"})-[:FRIENDS_WITH]-&gt;(friend:User)
WHERE (friend)-[:LIKES]-&gt;(:Topic {name: "Distributed Systems"})
RETURN friend.name
</code></pre>
<p><strong>Use when:</strong> Your domain is fundamentally relational in a graph sense — social networks, recommendation engines, fraud detection, knowledge graphs.</p>
<p><strong>Avoid when:</strong> Most other use cases. Graph databases are powerful but operationally complex.</p>
<hr />
<h2>PostgreSQL — Why It Often Wins Even Against NoSQL</h2>
<p>PostgreSQL has quietly absorbed many NoSQL use cases:</p>
<pre><code class="language-sql">-- JSONB column — document storage with SQL query capabilities
CREATE TABLE events (
  id UUID PRIMARY KEY,
  type VARCHAR(50),
  payload JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- GIN index on JSONB — fast document queries
CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- Query inside JSON
SELECT * FROM events
WHERE payload-&gt;&gt;'userId' = '123'
  AND type = 'purchase';

-- Full-text search (no Elasticsearch for basic cases)
CREATE INDEX idx_products_search ON products USING GIN (to_tsvector('english', name || ' ' || description));

SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('mechanical &amp; keyboard');

-- Time-series with partitioning (comparable to Cassandra for many workloads)
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  device_id UUID NOT NULL,
  value FLOAT
) PARTITION BY RANGE (time);
</code></pre>
<p>Before adding a new database to your stack, check if PostgreSQL already handles it. Adding a database means another system to operate, monitor, backup, and train your team on.</p>
<hr />
<h2>Indexing — The Most Impactful Optimization Most Teams Underuse</h2>
<p>A missing index is almost always the first cause of a slow query. An unnecessary index slows down writes.</p>
<pre><code class="language-sql">-- EXPLAIN ANALYZE: your best friend for query performance
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = '123'
  AND status = 'pending'
ORDER BY created_at DESC;

-- If you see "Seq Scan" on a large table, you're missing an index
-- Seq Scan  (cost=0.00..45000.00 rows=5 width=200) -- ❌ scanning every row

-- Add a composite index matching your query
CREATE INDEX idx_orders_user_status_created
ON orders (user_id, status, created_at DESC);

-- Now: Index Scan — fast
-- Index Scan using idx_orders_user_status_created  (cost=0.42..8.50 rows=5) -- ✅
</code></pre>
<p><strong>Composite index rule:</strong> Column order matters. Put equality conditions first (user_id, status), range/sort columns last (created_at).</p>
<hr />
<h2>The Decision Framework</h2>
<table>
<thead>
<tr>
<th>Your primary need</th>
<th>Consider</th>
</tr>
</thead>
<tbody><tr>
<td>ACID transactions, complex queries, relational data</td>
<td><strong>PostgreSQL / MySQL</strong></td>
</tr>
<tr>
<td>Document storage, flexible schema, hierarchical data</td>
<td><strong>MongoDB</strong> (or PostgreSQL JSONB)</td>
</tr>
<tr>
<td>Ultra-fast key lookups, sessions, caching</td>
<td><strong>Redis</strong></td>
</tr>
<tr>
<td>Massive write throughput, time-series, IoT</td>
<td><strong>Cassandra / ScyllaDB</strong> (or Timescale on PG)</td>
</tr>
<tr>
<td>Full-text search, faceted search</td>
<td><strong>Elasticsearch / OpenSearch</strong> (or PG full-text for simpler cases)</td>
</tr>
<tr>
<td>Graph traversals, social networks</td>
<td><strong>Neo4j / Neptune</strong></td>
</tr>
<tr>
<td>Analytical queries over large datasets</td>
<td><strong>BigQuery / Redshift / ClickHouse</strong></td>
</tr>
</tbody></table>
<hr />
<h2>Polyglot Persistence — When Multiple Databases Make Sense</h2>
<p>Large systems often use multiple databases, each for a specific purpose:</p>
<pre><code>User Service      → PostgreSQL (relational, ACID, user accounts/billing)
Product Catalog   → Elasticsearch (full-text search, faceted filtering)
Session Store     → Redis (fast key-value, TTL-based expiry)
Activity Feed     → Cassandra (high write throughput, time-ordered)
Recommendations   → Neo4j (graph traversals)
Analytics         → BigQuery (analytical, columnar, petabyte-scale)
</code></pre>
<p><strong>The warning:</strong> Each database you add is a system you must operate. Start with the minimum. Introduce a new store only when you have a concrete, measurable pain point that your current database can't address.</p>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>ACID transactions</strong> are invaluable — don't give them up unless you have a compelling reason</li>
<li><strong>CAP theorem</strong> is a useful frame: in a partition, choose consistency (banking) or availability (social feeds) based on your domain</li>
<li><strong>NoSQL solves specific problems</strong> — document stores, column families, key-value, graphs are each optimized for different access patterns</li>
<li><strong>PostgreSQL can handle more than you think</strong> — JSONB, full-text search, and partitioning cover many NoSQL use cases</li>
<li><strong>Indexing is the highest-ROI database optimization</strong> — understand your query patterns before adding hardware</li>
<li><strong>Polyglot persistence is real in large systems</strong> — but each database added is operational overhead</li>
</ul>
<hr />
<p><strong>What's the most painful database migration you've been through — either choosing the wrong one initially, or scaling beyond what it could handle?</strong></p>
<hr />
<p><em>Next in the series → <strong>Post 05: When to Stop Calling APIs and Start Publishing Events</strong></em></p>
<p><em>You've got your data store figured out. The next scaling inflection point is usually: synchronous calls don't compose well at scale.</em></p>
]]></content:encoded></item><item><title><![CDATA[Auth Is Not Security: What Engineers Get Wrong About Protecting APIs]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 03 of 07
Level: Advanced · Read time: ~10 min


Most API security bugs aren't cryptography failures. They're design failures.
The OWASP API Security Top]]></description><link>https://ajitabh.net/auth-is-not-security-what-engineers-get-wrong-about-protecting-apis</link><guid isPermaLink="true">https://ajitabh.net/auth-is-not-security-what-engineers-get-wrong-about-protecting-apis</guid><category><![CDATA[api security]]></category><category><![CDATA[authentication]]></category><category><![CDATA[authorization]]></category><category><![CDATA[JWT]]></category><category><![CDATA[oauth]]></category><category><![CDATA[Backend Engineering]]></category><category><![CDATA[System Design]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Web Security]]></category><category><![CDATA[backend developments]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Thu, 26 Mar 2026 13:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/dd18142b-9ea5-4a03-9354-3184a100ca44.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 03 of 07
<strong>Level:</strong> Advanced · <strong>Read time:</strong> ~10 min</p>
</blockquote>
<hr />
<p>Most API security bugs aren't cryptography failures. They're design failures.</p>
<p>The OWASP API Security Top 10 is the most authoritative list of real-world API vulnerabilities. It is dominated by problems like broken object-level authorization, excessive data exposure, and lack of rate limiting. Not broken TLS. Not weak encryption algorithms.</p>
<p>Engineers tend to conflate authentication ("who are you?") with security ("what can you actually do and what can go wrong?"). This post covers both. The auth patterns engineers deal with daily, and the security concerns that don't get enough attention until after the breach.</p>
<hr />
<h3>Authentication vs Authorization — Get This Right First</h3>
<p>These terms are often used interchangeably. They shouldn't be.</p>
<table>
<thead>
<tr>
<th>Concept</th>
<th>Question</th>
<th>Example</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Authentication (AuthN)</strong></td>
<td>Who are you?</td>
<td>Verifying a JWT token is valid</td>
</tr>
<tr>
<td><strong>Authorization (AuthZ)</strong></td>
<td>What are you allowed to do?</td>
<td>Checking if user can access <code>/orders/456</code></td>
</tr>
<tr>
<td><strong>Accounting</strong></td>
<td>What did you do?</td>
<td>Audit logs of actions taken</td>
</tr>
</tbody></table>
<p>Most auth bugs are authorization bugs. The token is valid — the user is who they say they are — but they can see data they shouldn't.</p>
<hr />
<h2>The Three Main Auth Patterns</h2>
<h3>1. API Keys — Simple, Durable, Underrated</h3>
<p>A random string issued to a client, sent with every request.</p>
<pre><code class="language-http">GET /api/v1/orders
Authorization: Bearer sk_live_a8f3j2k9...
# or
X-API-Key: sk_live_a8f3j2k9...
</code></pre>
<p><strong>Best for:</strong> Server-to-server communication, developer-facing public APIs, internal service authentication.</p>
<p><strong>Key implementation details:</strong></p>
<ul>
<li>Store only the <strong>hash</strong> of the key in your database, never the plaintext (same principle as passwords)</li>
<li>Use a prefix that identifies the key type: <code>sk_live_</code>, <code>pk_test_</code>, <code>svc_</code> — makes secret scanning easier</li>
<li>Support key rotation without downtime: allow two active keys per client during a rotation window</li>
<li>Log key usage for anomaly detection</li>
</ul>
<pre><code class="language-python">import hashlib, secrets

def create_api_key() -&gt; tuple[str, str]:
    """Returns (plaintext_key_shown_once, hash_stored_in_db)"""
    key = f"sk_live_{secrets.token_urlsafe(32)}"
    key_hash = hashlib.sha256(key.encode()).hexdigest()
    return key, key_hash

def verify_api_key(provided_key: str, stored_hash: str) -&gt; bool:
    provided_hash = hashlib.sha256(provided_key.encode()).hexdigest()
    return secrets.compare_digest(provided_hash, stored_hash)
    # Use compare_digest to prevent timing attacks
</code></pre>
<hr />
<h3>2. JWT (JSON Web Tokens) — Powerful but Frequently Misused</h3>
<p>A JWT is a self-contained token with three parts: header, payload, signature.</p>
<pre><code>eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9   ← Header (alg + type)
.eyJ1c2VySWQiOiIxMjMiLCJyb2xlIjoiYWRtaW4iLCJleHAiOjE3MDAwMDAwMDB9   ← Payload
.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c   ← Signature
</code></pre>
<p>The server can verify the signature without a database lookup — this is why JWTs are popular in distributed systems and microservices.</p>
<p><strong>Common JWT pitfalls:</strong></p>
<pre><code class="language-python"># ❌ WRONG: Accepting the "none" algorithm
# An attacker can craft a token with alg: "none" and no signature
jwt.decode(token, options={"verify_signature": False})  # Never do this

# ❌ WRONG: Using the algorithm from the token header
# Attacker changes alg to "none" or "HS256" with your public key as secret
algorithm = jwt.get_unverified_header(token)['alg']  # Never trust this

# ✅ CORRECT: Always specify the expected algorithm explicitly
jwt.decode(token, secret_key, algorithms=["RS256"])  # Whitelist the algorithm

# ❌ WRONG: Storing sensitive data in the payload
# JWT payload is base64-encoded, not encrypted — anyone can read it
{"userId": "123", "creditCardNumber": "4111..."}  # Don't do this

# ✅ CORRECT: Store only what's needed for authorization
{"userId": "123", "role": "admin", "exp": 1700000000}
</code></pre>
<p><strong>JWT vs Sessions trade-off:</strong></p>
<table>
<thead>
<tr>
<th></th>
<th>JWT (Stateless)</th>
<th>Sessions (Stateful)</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Revocation</strong></td>
<td>Hard — must wait for expiry or maintain a blocklist</td>
<td>Easy — delete from session store</td>
</tr>
<tr>
<td><strong>Scalability</strong></td>
<td>Any server can verify without coordination</td>
<td>Session store must be shared (Redis)</td>
</tr>
<tr>
<td><strong>Token size</strong></td>
<td>Larger (full claims in payload)</td>
<td>Smaller (just a session ID)</td>
</tr>
<tr>
<td><strong>Suitable for</strong></td>
<td>Microservices, mobile APIs</td>
<td>Traditional web apps</td>
</tr>
</tbody></table>
<blockquote>
<p>⚠️ <strong>The revocation problem is real.</strong> If you issue a JWT with a 24-hour expiry and a user changes their password or is suspended, that JWT is still valid until it expires. If revocation matters to you (it usually does), maintain a JWT blocklist in Redis or use short expiry times (5–15 minutes) with refresh tokens.</p>
</blockquote>
<hr />
<h3>3. OAuth 2.0 — Delegated Authorization Done Right</h3>
<p>OAuth 2.0 is not an authentication protocol (that's OpenID Connect on top of OAuth). It's a framework for <strong>delegated authorization</strong> — letting users grant third-party apps access to their data without sharing their password.</p>
<p>The four flows, matched to use cases:</p>
<pre><code>Authorization Code Flow
├── With PKCE (for SPAs, mobile apps)
└── Without PKCE (server-side web apps only — never expose client_secret in browser)

Client Credentials Flow
└── Machine-to-machine (no user involved)

Device Authorization Flow
└── Smart TVs, CLIs, IoT devices

Implicit Flow
└── ⚠️ Deprecated — never use for new implementations
</code></pre>
<p><strong>Most teams only need two:</strong></p>
<pre><code>User-facing apps → Authorization Code + PKCE
Service-to-service → Client Credentials
</code></pre>
<pre><code class="language-python"># Client Credentials — Service authenticating to another service
import httpx

def get_service_token(client_id: str, client_secret: str, token_url: str) -&gt; str:
    response = httpx.post(token_url, data={
        "grant_type": "client_credentials",
        "client_id": client_id,
        "client_secret": client_secret,
        "scope": "orders:read inventory:write"
    })
    return response.json()["access_token"]
</code></pre>
<hr />
<h2>OWASP API Security Top 10 — What Actually Gets APIs Breached</h2>
<p>Authentication is one piece. Here are a few vulnerabilities that show up in real incidents:</p>
<h3>Broken Object-Level Authorization (BOLA) (Most Common)</h3>
<p>A user can access objects (records) they shouldn't by manipulating IDs.</p>
<pre><code class="language-http">GET /api/orders/12345   ← User's own order
GET /api/orders/12346   ← Another user's order — does your API check ownership?
</code></pre>
<pre><code class="language-python"># ❌ WRONG — only checks authentication, not authorization
@app.get("/orders/{order_id}")
def get_order(order_id: str, current_user: User = Depends(get_current_user)):
    return db.get_order(order_id)  # Returns ANY order if user is authenticated

# ✅ CORRECT — checks that the order belongs to the requesting user
@app.get("/orders/{order_id}")
def get_order(order_id: str, current_user: User = Depends(get_current_user)):
    order = db.get_order(order_id)
    if order.user_id != current_user.id:
        raise HTTPException(status_code=403, detail="Forbidden")
    return order
</code></pre>
<h3>Excessive Data Exposure</h3>
<p>Returning more data than the client needs, relying on them to filter it.</p>
<pre><code class="language-python"># ❌ WRONG — serializes the full User model
return db.get_user(user_id)
# Includes: password_hash, internal_notes, admin_flags, ...

# ✅ CORRECT — explicit response schema
class UserPublicResponse(BaseModel):
    id: str
    name: str
    email: str
    # Nothing else
</code></pre>
<h3>Lack of Rate Limiting</h3>
<p>Without rate limiting, your API is vulnerable to brute-force, credential stuffing, and scraping.</p>
<pre><code class="language-python"># Using a token bucket approach with Redis
def check_rate_limit(client_id: str, limit: int = 100, window: int = 60) -&gt; bool:
    key = f"rate_limit:{client_id}"
    pipe = redis.pipeline()
    pipe.incr(key)
    pipe.expire(key, window)
    count, _ = pipe.execute()
    return count &lt;= limit
</code></pre>
<p>Or at the infrastructure level with NGINX:</p>
<pre><code class="language-nginx"># Limit to 10 requests/second per IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=api burst=20 nodelay;
    }
}
</code></pre>
<hr />
<h2>HTTPS, HSTS, and Transport Security</h2>
<p>HTTPS should be non-negotiable. But securing transport is not just about turning on TLS.
HSTS tells the browser to always use HTTPS and never fall back to HTTP.
A few headers help close common gaps:</p>
<pre><code class="language-http"># Force HTTPS for your domain + subdomains, 1 year, include in preload list
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

# Prevent MIME type sniffing
X-Content-Type-Options: nosniff

# Control what info leaks in the Referer header
Referrer-Policy: strict-origin-when-cross-origin

# Disable browser features you don't need
Permissions-Policy: geolocation=(), camera=(), microphone=()
</code></pre>
<hr />
<h3>Secrets Management — Where Most Teams Cut Corners</h3>
<p>Hardcoded secrets are the most preventable security vulnerability.</p>
<pre><code class="language-bash"># ❌ Hardcoded in code — will end up in git history eventually
DATABASE_URL = "postgresql://admin:mypassword@prod-db:5432/app"

# ❌ In .env committed to repo
echo ".env" &gt;&gt; .gitignore   # This gets forgotten

# ✅ Fetched from a secrets manager at runtime
import boto3

def get_secret(secret_name: str) -&gt; str:
    client = boto3.client("secretsmanager")
    return client.get_secret_value(SecretId=secret_name)["SecretString"]

DATABASE_URL = get_secret("prod/database/url")
</code></pre>
<p>Use AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault. The investment is low; the blast radius of a leaked secret can be catastrophic.</p>
<hr />
<h2>Security Checklist for APIs</h2>
<p>Before shipping an API endpoint, run through this:</p>
<ul>
<li> Authentication required on all non-public routes</li>
<li> Object-level authorization: does the user own this resource?</li>
<li> Response schema is explicit — no extra fields leaking</li>
<li> Rate limiting on auth endpoints (login, token issuance)</li>
<li> Rate limiting on resource endpoints</li>
<li> Input validation on all parameters (types, lengths, allowed values)</li>
<li> No sensitive data in JWT payload</li>
<li> API keys hashed in storage, never in logs</li>
<li> HTTPS enforced with HSTS header</li>
<li> Secrets loaded from a secrets manager, not env vars in code</li>
</ul>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>Authentication ≠ Authorization</strong> — most breaches happen when you verify identity but don't verify permission</li>
<li><strong>API Keys</strong> are underrated for server-to-server auth — hash them, support rotation, prefix for scanning</li>
<li><strong>JWT pitfalls</strong> (none algorithm, payload exposure, no revocation) are more common than you'd think</li>
<li><strong>OAuth 2.0:</strong> Authorization Code + PKCE for users, Client Credentials for services — that's most of what you need</li>
<li><strong>BOLA</strong> (broken object-level auth) is the #1 real-world API vulnerability — always check resource ownership</li>
<li><strong>Rate limiting and secrets management</strong> are table stakes, not nice-to-haves</li>
</ul>
<hr />
<p><strong>What's the most memorable security incident you've seen or heard about that started with an API design mistake — not a cryptography failure?</strong></p>
<hr />
<p><em>Next in the series → <strong>Post 04: SQL or NoSQL? Wrong Question. Here's the Right One.</strong></em></p>
<p><em>You know who's talking to your API and what they're allowed to do. Now: where does the data actually live?</em></p>
]]></content:encoded></item><item><title><![CDATA[Cache Invalidation: The Problem That Humbles Every Engineer]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 02 of 07
Level: Intermediate · Read time: ~9 min


Phil Karlton famously said there are only two hard problems in computer science: cache invalidation a]]></description><link>https://ajitabh.net/cache-invalidation-the-problem-that-humbles-every-engineer</link><guid isPermaLink="true">https://ajitabh.net/cache-invalidation-the-problem-that-humbles-every-engineer</guid><category><![CDATA[caching]]></category><category><![CDATA[Redis]]></category><category><![CDATA[System Design]]></category><category><![CDATA[memcached]]></category><category><![CDATA[Cache Invalidation]]></category><category><![CDATA[backend developments]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[web performance]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Wed, 25 Mar 2026 12:31:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/88c1dc88-7096-4f3d-b1f0-b896663ee9c0.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 02 of 07
<strong>Level:</strong> Intermediate · <strong>Read time:</strong> ~9 min</p>
</blockquote>
<hr />
<p>Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things.</p>
<p>He was joking. But he wasn't wrong.</p>
<p>Caching seems simple. You store a result and serve the stored version next time. The hard part isn't storing data. It's knowing <em>when the stored version is no longer valid</em>, and handling that correctly at scale without bringing your database to its knees in the process.</p>
<p>This post covers the caching concepts that matter in production: where to cache, what to cache, how to invalidate it, and the failure modes that catch teams off guard.</p>
<hr />
<h3>Why Caching Matters (Beyond "It Makes Things Fast")</h3>
<p>Before diving into mechanisms, let's be clear about what caching actually protects:</p>
<ul>
<li><strong>Database load</strong> — Every cache hit is a DB query that didn't happen</li>
<li><strong>Latency</strong> — Memory reads are ~100x faster than a network round-trip to a DB</li>
<li><strong>Cost</strong> — Fewer DB operations = smaller instance sizes = real money at scale</li>
<li><strong>Resilience</strong> — A warm cache can serve traffic even when the DB is degraded</li>
</ul>
<p>But caching introduces its own risks: <strong>stale data</strong>, <strong>cache stampedes</strong>, <strong>memory pressure</strong>, and <strong>invalidation bugs</strong> that surface as subtle data inconsistencies. Understanding these tradeoffs is what separates a senior engineer from someone who just adds Redis to every problem.</p>
<hr />
<h2>The Caching Layers</h2>
<p>Modern systems have caching at multiple levels, and understanding each layer helps you place data in the right one.</p>
<pre><code>Client Request
     ↓
[Browser Cache]        ← Layer 1: HTTP Cache-Control headers
     ↓
[CDN / Edge Cache]     ← Layer 2: Cloudflare, Fastly, CloudFront
     ↓
[API Gateway Cache]    ← Layer 3: Optional, for high-traffic APIs
     ↓
[Application Cache]    ← Layer 4: Redis, Memcached (your code controls this)
     ↓
[Database Buffer Pool] ← Layer 5: MySQL/Postgres keeps hot pages in memory
     ↓
[Disk]
</code></pre>
<p>Most teams operate actively at Layers 2 and 4. The decisions you make there have the biggest impact.</p>
<hr />
<h2>Redis vs Memcached — The Honest Comparison</h2>
<p>Both are in-memory key-value stores. Most teams should just use <strong>Redis</strong>. Here's why:</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Redis</th>
<th>Memcached</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Data structures</strong></td>
<td>Strings, hashes, lists, sets, sorted sets, streams</td>
<td>Strings only</td>
</tr>
<tr>
<td><strong>Persistence</strong></td>
<td>Optional (RDB snapshots, AOF logs)</td>
<td>None</td>
</tr>
<tr>
<td><strong>Replication</strong></td>
<td>Built-in primary/replica</td>
<td>None (third-party)</td>
</tr>
<tr>
<td><strong>Clustering</strong></td>
<td>Redis Cluster (built-in)</td>
<td>Client-side sharding</td>
</tr>
<tr>
<td><strong>Pub/Sub</strong></td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td><strong>Lua scripting</strong></td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td><strong>Memory efficiency</strong></td>
<td>Good</td>
<td>Slightly better for simple strings</td>
</tr>
<tr>
<td><strong>Multithreading</strong></td>
<td>Single-threaded (I/O event loop)</td>
<td>Multi-threaded</td>
</tr>
</tbody></table>
<p><strong>Use Memcached when:</strong> You have a very specific use case — pure string caching at enormous scale — and you've benchmarked that Memcached's multi-threaded architecture genuinely outperforms Redis for your workload. This is rare.</p>
<p><strong>Use Redis for everything else.</strong> The richer data structures alone (sorted sets for leaderboards, streams for queues) make it the practical default.</p>
<hr />
<h2>Caching Strategies</h2>
<h3>Cache-Aside (Lazy Loading)</h3>
<p>The most common pattern. Your application manages the cache explicitly.</p>
<pre><code class="language-python">def get_user(user_id: str) -&gt; User:
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)
    
    # 2. Cache miss — fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    
    # 3. Populate cache for next time
    redis.setex(f"user:{user_id}", 3600, user.to_json())  # TTL: 1 hour
    
    return user
</code></pre>
<p><strong>Pros:</strong> Only caches data that's actually requested. Simple to reason about.<br /><strong>Cons:</strong> First request always hits the DB (cold cache). Race condition possible if multiple requests miss simultaneously.</p>
<hr />
<h3>Write-Through</h3>
<p>Write to the cache and DB simultaneously on every write.</p>
<pre><code class="language-python">def update_user(user_id: str, data: dict) -&gt; User:
    user = db.update("UPDATE users SET ... WHERE id = %s", user_id, data)
    redis.setex(f"user:{user_id}", 3600, user.to_json())  # Sync write to cache
    return user
</code></pre>
<p><strong>Pros:</strong> Cache is always consistent with DB. No stale reads after writes.<br /><strong>Cons:</strong> Write latency increases. Cache fills with data that might never be read.</p>
<hr />
<h3>Write-Behind (Write-Back)</h3>
<p>Write to cache immediately, write to DB asynchronously.</p>
<p><strong>Pros:</strong> Extremely fast writes.<br /><strong>Cons:</strong> Risk of data loss if cache fails before async write completes. Complex error handling. Use only when you fully understand the durability tradeoff.</p>
<hr />
<h3>Read-Through</h3>
<p>The cache layer itself fetches from DB on a miss — your application always talks to the cache.</p>
<pre><code class="language-python"># Cache library handles DB fallback automatically
user = cache.get(f"user:{user_id}", loader=lambda: db.find_user(user_id))
</code></pre>
<p><strong>Pros:</strong> Application code stays clean. Cache and DB logic are centralized.<br /><strong>Cons:</strong> Requires a cache library or proxy that supports this pattern.</p>
<hr />
<h2>Cache Invalidation — The Hard Part</h2>
<p>There are three approaches, each with different tradeoffs:</p>
<h3>1. TTL (Time-To-Live) — Simplest</h3>
<p>Set an expiry time. The data becomes stale after that window.</p>
<pre><code class="language-python">redis.setex("product:456:price", 300, "29.99")  # Expires in 5 minutes
</code></pre>
<p><strong>Works well for:</strong> Data that can tolerate slight staleness — product listings, user profile data, search results.<br /><strong>Fails for:</strong> Anything that needs immediate consistency after a write — account balances, inventory levels, permissions.</p>
<hr />
<h3>2. Event-Driven Invalidation — Most Correct</h3>
<p>When data changes, explicitly invalidate or update the cached version.</p>
<pre><code class="language-python">def update_product_price(product_id: str, new_price: float):
    db.update("UPDATE products SET price = %s WHERE id = %s", new_price, product_id)
    redis.delete(f"product:{product_id}:price")  # Explicit invalidation
    # Or: redis.set(...) to update immediately rather than wait for next read
</code></pre>
<p><strong>Works well for:</strong> Data that must be fresh after writes.<br /><strong>Fails for:</strong> Systems with complex invalidation logic across many cache keys — one update triggers a cascade of invalidations that's hard to track.</p>
<hr />
<h3>3. Cache Tags / Dependency Tracking — Advanced</h3>
<p>Group related cache entries under a tag. Invalidate the tag, and all entries under it expire.</p>
<pre><code class="language-python"># Pseudo-code — some Redis libraries support this natively
cache.set("user:123:orders", data, tags=["user:123", "orders"])
cache.invalidate_tag("user:123")  # Clears user:123:orders and all other tagged entries
</code></pre>
<p><strong>Works well for:</strong> Complex, nested data that comes from a single entity.<br /><strong>Requires:</strong> A cache library or framework that supports this pattern (Symfony Cache, Django's cache framework, etc.)</p>
<hr />
<h2>The Cache Stampede Problem</h2>
<p>Imagine 10,000 concurrent users hit your app. A popular cache key expires. All 10,000 requests miss the cache simultaneously and hammer your database at once.</p>
<p>This is a <strong>cache stampede</strong> (also called dogpiling). It can bring down a database that was otherwise healthy.</p>
<pre><code>T=0: Cache key expires
T=0.001: 10,000 requests arrive, all miss cache
T=0.001: 10,000 DB queries fire simultaneously
T=0.5: Database CPU spikes to 100%
T=1.0: DB starts timing out requests
T=1.5: Your PagerDuty alert fires
</code></pre>
<p><strong>Solutions:</strong></p>
<p><strong>Mutex / Locking</strong> — Only one request rebuilds the cache. Others wait.</p>
<pre><code class="language-python">def get_with_lock(key: str, loader_fn):
    value = redis.get(key)
    if value:
        return value
    
    lock_key = f"lock:{key}"
    if redis.set(lock_key, "1", nx=True, ex=10):  # Acquire lock
        try:
            value = loader_fn()
            redis.setex(key, 3600, value)
            return value
        finally:
            redis.delete(lock_key)
    else:
        time.sleep(0.1)  # Wait and retry
        return get_with_lock(key, loader_fn)
</code></pre>
<p><strong>Probabilistic Early Expiration</strong> — Start refreshing the cache <em>before</em> it expires, with a small random probability as TTL approaches.</p>
<p><strong>Stale-While-Revalidate</strong> — Serve the stale value immediately, refresh in the background. The user gets a fast (slightly stale) response while the next request will get fresh data.</p>
<hr />
<h2>CDN Caching — Don't Forget the Edge</h2>
<p>For static assets, API responses, and server-rendered pages, CDN-level caching is often more impactful than application caching.</p>
<pre><code class="language-http"># Response headers that control CDN behavior
Cache-Control: public, max-age=3600, s-maxage=86400
# public = CDN can cache this
# max-age = browser TTL (1 hour)
# s-maxage = CDN TTL (1 day)

Cache-Control: private, no-store
# private = only the browser caches this, not CDNs
# no-store = don't cache anywhere (for sensitive data)

Surrogate-Key: product-456 category-shoes
# Fastly/Varnish: tag-based purging at the CDN edge
</code></pre>
<p><strong>Cache-busting for static assets:</strong> Use content hashes in filenames so you can set long TTLs without worrying about stale JS/CSS.</p>
<pre><code># Build output
app.a3f9c2d1.js   ← Hash changes when content changes
app.css → app.b8e4d6a2.css
</code></pre>
<hr />
<h2>What NOT to Cache</h2>
<p>Caching everything is an anti-pattern. Some things should never be cached:</p>
<ul>
<li><strong>User-specific sensitive data</strong> (auth tokens, payment info) — unless isolated per-user with short TTLs</li>
<li><strong>Write-heavy data</strong> — cache churn (constant invalidations) adds overhead with no benefit</li>
<li><strong>Uniqueness checks</strong> — "is this username taken?" must always hit the source of truth</li>
<li><strong>Random or time-sensitive outputs</strong> — <code>NOW()</code>, <code>UUID()</code>, anything that must be unique per request</li>
</ul>
<hr />
<h2>Quick Reference: Eviction Policies</h2>
<p>When Redis runs out of memory, it evicts keys based on its configured policy:</p>
<table>
<thead>
<tr>
<th>Policy</th>
<th>Behavior</th>
<th>Use When</th>
</tr>
</thead>
<tbody><tr>
<td><code>noeviction</code></td>
<td>Returns error on write when full</td>
<td>You need strict control</td>
</tr>
<tr>
<td><code>allkeys-lru</code></td>
<td>Evicts least recently used keys</td>
<td>General-purpose cache</td>
</tr>
<tr>
<td><code>volatile-lru</code></td>
<td>LRU eviction only for keys with TTL</td>
<td>You have a mix of TTL and permanent keys</td>
</tr>
<tr>
<td><code>allkeys-lfu</code></td>
<td>Evicts least <em>frequently</em> used (Redis 4+)</td>
<td>Access patterns are skewed</td>
</tr>
<tr>
<td><code>volatile-ttl</code></td>
<td>Evicts keys closest to expiry</td>
<td>You want to preserve recently-refreshed data</td>
</tr>
</tbody></table>
<p>For a pure cache workload, <strong><code>allkeys-lru</code></strong> or <strong><code>allkeys-lfu</code></strong> are usually the right defaults.</p>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>Redis is the practical default</strong> — richer data structures, replication, and pub/sub make it worth the marginal overhead over Memcached</li>
<li><strong>TTL-based expiration</strong> is simple and works well for data that tolerates some staleness</li>
<li><strong>Event-driven invalidation</strong> is correct but requires discipline to maintain as systems evolve</li>
<li><strong>Cache stampedes are real</strong> — use locks, early expiration, or stale-while-revalidate for high-traffic keys</li>
<li><strong>CDN caching</strong> is often more impactful than application-level caching for read-heavy, public data</li>
<li><strong>Don't cache everything</strong> — cache what's expensive to recompute and safe to serve slightly stale</li>
</ul>
<hr />
<p><strong>Have you been bitten by a cache invalidation bug in production? What was the data inconsistency and how long did it take to find it?</strong></p>
<p>Those are the stories the comments were made for.</p>
<hr />
<p><em>Next in the series → <strong>Post 03: Auth Is Not Security — A Guide for Teams Who Ship Fast</strong></em></p>
<p><em>You've cached your data efficiently. Now: who's allowed to see it?</em></p>
]]></content:encoded></item><item><title><![CDATA[The API Decision That Haunts Your Architecture]]></title><description><![CDATA[Series: Backend Engineering Fundamentals · Post 01 of 07
Level: Intermediate · Read time: ~8 min


A team I know spent nine months migrating their mobile backend from REST to GraphQL. Two engineers de]]></description><link>https://ajitabh.net/rest-vs-soap-vs-graphql-vs-grpc</link><guid isPermaLink="true">https://ajitabh.net/rest-vs-soap-vs-graphql-vs-grpc</guid><category><![CDATA[api]]></category><category><![CDATA[REST API]]></category><category><![CDATA[GraphQL]]></category><category><![CDATA[gRPC]]></category><category><![CDATA[System Design]]></category><category><![CDATA[backend]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Web Development]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Tue, 24 Mar 2026 13:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/fd0877ff-c501-47fb-a314-3eb5d537a894.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series:</strong> Backend Engineering Fundamentals · Post 01 of 07
<strong>Level:</strong> Intermediate · <strong>Read time:</strong> ~8 min</p>
</blockquote>
<hr />
<p>A team I know spent nine months migrating their mobile backend from REST to GraphQL. Two engineers dedicated full-time. At the end of it, their core performance problem of slow dashboard loads was unchanged. The culprit was N+1 queries in the database layer they had never touched.</p>
<p>API decisions feel reversible. They rarely are. By the time you have built client SDKs, versioning contracts, and downstream integrations, switching paradigms is a full rewrite. That is why your API style is an architectural decision and not an implementation detail.</p>
<p>Let us break down the four major API paradigms honestly so you can choose based on your actual constraints and not the current hype cycle.</p>
<hr />
<h2>The Four Paradigms at a Glance</h2>
<table>
<thead>
<tr>
<th>Aspect</th>
<th>REST</th>
<th>SOAP</th>
<th>GraphQL</th>
<th>gRPC</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Protocol</strong></td>
<td>HTTP</td>
<td>HTTP, SMTP</td>
<td>HTTP</td>
<td>HTTP/2</td>
</tr>
<tr>
<td><strong>Data Format</strong></td>
<td>JSON, XML</td>
<td>XML (strict)</td>
<td>JSON</td>
<td>Protobuf (binary)</td>
</tr>
<tr>
<td><strong>Flexibility</strong></td>
<td>Medium</td>
<td>Low and rigid</td>
<td>High and client-driven</td>
<td>Medium with strong contract</td>
</tr>
<tr>
<td><strong>Performance</strong></td>
<td>Moderate</td>
<td>Heavy due to XML overhead</td>
<td>Good as it avoids over-fetching</td>
<td>Excellent with binary and streaming</td>
</tr>
<tr>
<td><strong>Team Overhead</strong></td>
<td>Low</td>
<td>High</td>
<td>Medium</td>
<td>Medium</td>
</tr>
<tr>
<td><strong>Best For</strong></td>
<td>Web and mobile APIs</td>
<td>Enterprise and legacy</td>
<td>Complex UI data needs</td>
<td>Internal microservices</td>
</tr>
</tbody></table>
<hr />
<h2>REST — The Default, For Good Reason</h2>
<p>REST (Representational State Transfer) is stateless, resource-based, and uses standard HTTP methods: GET, POST, PUT, DELETE. JSON is the lingua franca. Almost every developer knows it, every framework supports it, and tooling is mature.</p>
<p><strong>The real strength:</strong> REST's simplicity is the feature itself. Low cognitive overhead means faster onboarding, easier debugging, and predictable behavior in production.</p>
<p><strong>The honest weakness:</strong> REST can lead to over-fetching where the response contains more fields than the client needs, or under-fetching where the client needs multiple round trips to assemble a view. For most teams this is manageable. For teams with high-traffic and data-heavy mobile apps it becomes real latency.</p>
<pre><code class="language-http">GET /api/v1/users/123
GET /api/v1/users/123/orders
GET /api/v1/users/123/preferences
# Three requests to build one profile page
</code></pre>
<p><strong>Use REST when:</strong></p>
<ul>
<li>You are building a public-facing API</li>
<li>Your team is mixed seniority or onboarding quickly</li>
<li>You need broad tooling, documentation, and ecosystem support</li>
<li>You do not yet know all the ways your data will be consumed</li>
</ul>
<hr />
<h2>SOAP — Not Dead, Just Misunderstood</h2>
<p>SOAP has a deserved reputation for verbosity, WSDLs are painful, and XML parsing is heavy. And yet it still runs banking systems, healthcare integrations, and government infrastructure worldwide.</p>
<p>Why? Because SOAP has built-in standards for things REST leaves entirely to you:</p>
<ul>
<li><strong>WS-Security</strong> for message-level encryption and signing</li>
<li><strong>WS-AtomicTransaction</strong> for distributed transaction support</li>
<li><strong>WSDL contracts</strong> that are machine-readable, strongly typed, and version-controlled</li>
</ul>
<pre><code class="language-xml">&lt;soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"&gt;
  &lt;soapenv:Body&gt;
    &lt;pay:ProcessPayment&gt;
      &lt;pay:AccountId&gt;ACC-9821&lt;/pay:AccountId&gt;
      &lt;pay:Amount&gt;500.00&lt;/pay:Amount&gt;
    &lt;/pay:ProcessPayment&gt;
  &lt;/soapenv:Body&gt;
&lt;/soapenv:Envelope&gt;
</code></pre>
<p>If you are integrating with a payment processor, a hospital EMR system, or any legacy enterprise platform built before 2010, you are using SOAP whether you planned to or not.</p>
<p><strong>Use SOAP when:</strong></p>
<ul>
<li>You are in a compliance-heavy domain such as finance, healthcare, or government</li>
<li>You are integrating with enterprise systems that mandate it</li>
<li>You need formal, auditable contracts and built-in security standards</li>
</ul>
<hr />
<h2>GraphQL — Powerful, But Only With Discipline</h2>
<p>GraphQL flips the data fetching model. Instead of the server defining what data an endpoint returns, the client declares exactly what it needs. One request, precisely the data you asked for, nothing more.</p>
<pre><code class="language-graphql">query {
  user(id: "123") {
    name
    email
    orders(last: 5) {
      id
      total
      status
    }
  }
}
</code></pre>
<p>This is genuinely powerful for complex UIs such as dashboards, news feeds, and mobile apps where different views need different data shapes.</p>
<p><strong>The honest tradeoffs:</strong></p>
<ol>
<li><strong>N+1 query problem</strong> means naive GraphQL resolvers can fire a database query per item in a list. You need DataLoader or similar batching patterns to fix this.</li>
<li><strong>Schema governance</strong> means as your graph grows, schema discipline becomes a team-wide practice and not a one-time setup.</li>
<li><strong>Authorization complexity</strong> means REST's approach of protecting the endpoint is simpler than protecting each individual field on the graph.</li>
<li><strong>Caching</strong> means HTTP-level caching is trivial with REST but with GraphQL's POST-based queries you need persisted queries or custom caching layers.</li>
</ol>
<pre><code class="language-graphql"># Without DataLoader: N+1 queries for 100 orders
type Query {
  orders: [Order]  # 1 query
}
type Order {
  customer: User  # 100 queries, one per order
}
</code></pre>
<p><strong>Use GraphQL when:</strong></p>
<ul>
<li>Your frontend teams are strong and own the client-side queries</li>
<li>You have multiple clients such as mobile, web, and partner apps needing different data shapes</li>
<li>Your backend team can invest in schema governance and query depth limiting</li>
<li>You are building a product and not a generic public API</li>
</ul>
<hr />
<h2>gRPC — The Microservices Workhorse</h2>
<p>gRPC uses Protocol Buffers (Protobuf), a binary serialization format, over HTTP/2. The result is significantly faster than JSON over HTTP/1.1, with native support for streaming in both directions.</p>
<pre><code class="language-protobuf">// Define your contract in .proto
service OrderService {
  rpc GetOrder (OrderRequest) returns (OrderResponse);
  rpc StreamOrders (Empty) returns (stream Order);
}

message OrderRequest {
  string order_id = 1;
}
</code></pre>
<p>The contract is defined in <code>.proto</code> files that generate client and server code in multiple languages. This makes gRPC exceptional in polyglot environments where your Go service and your Python service speak the same typed contract.</p>
<p><strong>The tradeoffs:</strong></p>
<ul>
<li>Browser support is limited and requires a gRPC-Web proxy for browser clients</li>
<li>Protobuf binary is not human-readable and is harder to debug without proper tooling</li>
<li>Managing <code>.proto</code> files adds workflow overhead for smaller teams</li>
</ul>
<p><strong>Use gRPC when:</strong></p>
<ul>
<li>You are building internal service-to-service communication</li>
<li>Performance and latency are critical</li>
<li>You have or expect a polyglot architecture</li>
<li>You need bidirectional streaming for real-time data or event feeds</li>
</ul>
<hr />
<h2>How to Actually Choose</h2>
<p>Stop asking which API is best. Start asking what your team needs to operate reliably at your current scale.</p>
<table>
<thead>
<tr>
<th>Your situation</th>
<th>Recommended</th>
</tr>
</thead>
<tbody><tr>
<td>Public API, general purpose, mixed team</td>
<td><strong>REST</strong></td>
</tr>
<tr>
<td>Complex UI, multiple client types, strong frontend</td>
<td><strong>GraphQL</strong></td>
</tr>
<tr>
<td>Internal microservices, high throughput, polyglot</td>
<td><strong>gRPC</strong></td>
</tr>
<tr>
<td>Enterprise integration, compliance-driven, legacy</td>
<td><strong>SOAP</strong></td>
</tr>
<tr>
<td>Mobile app with internal services</td>
<td><strong>REST externally, gRPC internally</strong></td>
</tr>
<tr>
<td>Startup moving fast with a small team</td>
<td><strong>REST until it hurts</strong></td>
</tr>
</tbody></table>
<blockquote>
<p>Most mature systems use more than one paradigm. REST for the public API, gRPC for the service mesh, webhooks for async consumers. There is no rule against mixing paradigms, just the cost of maintaining each one.</p>
</blockquote>
<hr />
<h2>Three Pitfalls to Avoid</h2>
<p><strong>1. Migrating for hype and not for pain</strong></p>
<p>GraphQL and gRPC are excellent but adopting them before you have the problem they solve is expensive. Slow queries? Fix the database. Mobile over-fetching on five endpoints? REST versioning or sparse fieldsets might be cheaper than a full migration.</p>
<p><strong>2. Treating API design as a junior task</strong></p>
<p>API contracts outlive the engineers who write them. Resource naming, versioning strategy, and error envelope structure all become your team's debt or foundation depending on the care taken in the first sprint.</p>
<p><strong>3. One style for everything</strong></p>
<p>A GraphQL API that also powers your internal service mesh is the wrong tool in the wrong place. Match the paradigm to the use case, even if that means maintaining two styles.</p>
<hr />
<h2>Key Takeaways</h2>
<ul>
<li><strong>REST</strong> is still the right default and you should not fix what is not broken</li>
<li><strong>GraphQL</strong> pays off when UI needs are complex and your team has schema discipline</li>
<li><strong>gRPC</strong> wins for internal microservice communication at scale</li>
<li><strong>SOAP</strong> survives because enterprise compliance demands it and that reality deserves respect</li>
<li>Most production systems run multiple API styles so match the tool to the context</li>
<li>The best API is the one your team can build, operate, and debug at 2am</li>
</ul>
<hr />
<p><strong>What is an API migration you have lived through? Did switching paradigms solve the original problem or did it just reveal a different one?</strong></p>
<p>Drop your thoughts in the comments. The best real-world examples will make it into Part 2.</p>
<hr />
<p><em>Next in the series: <strong>Post 02 — Cache Invalidation, The Problem That Humbles Everyone</strong></em></p>
<p><em>After you have decided how clients talk to your system, the next question is what do you do when those requests are expensive?</em></p>
]]></content:encoded></item><item><title><![CDATA[RAG vs Vectorless RAG: How AI Systems Retrieve Knowledge]]></title><description><![CDATA[How AI finds answers — and why the next generation is rethinking the approach.


Introduction
LLMs are powerful, but they only know what they were trained on — once training ends, new documents, compa]]></description><link>https://ajitabh.net/rag-vs-vectorless-rag</link><guid isPermaLink="true">https://ajitabh.net/rag-vs-vectorless-rag</guid><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[llm]]></category><category><![CDATA[vector database]]></category><category><![CDATA[RAG ]]></category><dc:creator><![CDATA[Ajitabh Singh]]></dc:creator><pubDate>Mon, 23 Mar 2026 05:22:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69c0b381d9da55a9a5203e04/caa03ab6-195e-448f-b878-43af57bc820f.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>How AI finds answers — and why the next generation is rethinking the approach.</p>
</blockquote>
<hr />
<h2>Introduction</h2>
<p>LLMs are powerful, but they only know what they were trained on — once training ends, new documents, company updates, and recently uploaded PDFs are invisible to them. RAG solves this.</p>
<p><strong>Retrieval-Augmented Generation (RAG)</strong> lets the AI search for relevant information <em>before</em> answering. It uses what it finds to write accurate, grounded responses — no retraining required.</p>
<p>💡 <strong>In short: RAG = Search first, then generate.</strong></p>
<hr />
<h2>What is RAG?</h2>
<p>RAG makes AI "up-to-date" without retraining it constantly. It works in five steps:</p>
<h3>1. Indexing: Prepare Your Documents</h3>
<p>All documents (PDFs, web pages, text files) are organised into a searchable index — like building a card catalogue in a library. It happens once upfront and updates whenever new documents arrive.</p>
<h3>2. Chunking: Split Documents into Pieces</h3>
<p>LLMs can only process a limited amount of text at once. So documents are split into <strong>chunks</strong> — paragraphs or sections — to fit the AI's context window.</p>
<ul>
<li><strong>Too small</strong> → loses surrounding context</li>
<li><strong>Too large</strong> → wastes the AI's limited memory</li>
</ul>
<h3>3. Embeddings: Turn Text into Numbers</h3>
<p>To find <em>meaning</em>, not just keywords, each chunk is converted into a <strong>vector</strong> — a list of numbers that represents its meaning. Similar concepts produce similar vectors, even when the words are completely different.</p>
<p><em>Example:</em> "The cat sat on the mat" and "A feline rested on the rug" → nearly identical vectors.</p>
<h3>4. Vector Database: Store the Meaning</h3>
<p>Vectors are stored in a <strong>vector database</strong> (Pinecone, Weaviate, Qdrant, FAISS) that enables fast semantic search across thousands — or millions — of chunks.</p>
<h3>5. Query Time: Answering Questions</h3>
<ol>
<li>User asks a question</li>
<li>Question is converted into a vector</li>
<li>Semantic search finds the <strong>top-k most similar chunks</strong></li>
<li>Chunks + question are combined into a prompt</li>
<li>LLM generates a grounded answer</li>
</ol>
<p>✅ Works well for many applications — but it has real limitations.</p>
<hr />
<h2>Problems with Traditional RAG</h2>
<p>The entire RAG pipeline is only as good as its weakest link — <strong>retrieval</strong>. Here's where it breaks down:</p>
<ul>
<li><strong>Chunks lose context</strong> — fragments miss the surrounding meaning that gives them significance</li>
<li><strong>Semantic search isn't perfect</strong> — embeddings can miss relevant sections, especially in specialised domains</li>
<li><strong>Information spans multiple chunks</strong> — answers often need several sections combined, but RAG treats each chunk independently</li>
<li><strong>Chunking is tricky</strong> — too big, too small, or overlapping chunks all introduce errors</li>
<li><strong>Vector databases need maintenance</strong> — updating, deleting, and re-indexing adds operational complexity over time</li>
<li><strong>Confident mistakes</strong> — AI writes fluent, authoritative answers even when the retrieved chunks are slightly off-topic</li>
</ul>
<blockquote>
<p><em>"The weakness of RAG is not the generation — it is the retrieval. If the right information was never found, the best AI in the world cannot save you."</em></p>
</blockquote>
<hr />
<h2>Vectorless RAG: A Different Approach</h2>
<p>Vectorless RAG skips vectors entirely. Instead of searching by similarity, it <strong>reasons</strong> through documents to find answers — like a detective working a case, not a search engine matching keywords.</p>
<p>💡 <strong>Core idea:</strong> Break the question into sub-questions, navigate to the exact document sections, read them in full, then combine everything into one complete answer.</p>
<h3>How It Works</h3>
<p>Think of how a doctor diagnoses a patient:</p>
<p><em>Fever → infection? → what type? → check bloodwork → treat accordingly</em></p>
<p>Each step guides the next. Vectorless RAG applies this same logic to documents:</p>
<pre><code>Question: What is our employee leave policy?
├── Sick days?          → HR Manual, Section 3.2
├── Annual leave?       → HR Manual, Section 4.1
└── Approval process?   → Policy Doc, Approval Workflow

              ↓
   Read each section in full
              ↓
   Synthesise one complete, context-rich answer
</code></pre>
<p>After reading each section in full, the AI combines the answers into one complete, context-rich response — no guessing from fragments.</p>
<p>No embeddings. No vector database. The "index" is simply a clear, hierarchical map of your documents — easy to read, easy to update.</p>
<p>✅ <strong>Accurate, context-rich, low maintenance</strong>
⚠️ Works best with well-structured documents</p>
<hr />
<h2>RAG vs Vectorless RAG: At a Glance</h2>
<table>
<thead>
<tr>
<th>Factor</th>
<th>Traditional RAG</th>
<th>Vectorless RAG</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Speed</strong></td>
<td>✅ Fast (1–3 sec)</td>
<td>⚠️ Slower (10–30 sec)</td>
</tr>
<tr>
<td><strong>Accuracy</strong></td>
<td>⚠️ Moderate</td>
<td>✅ High</td>
</tr>
<tr>
<td><strong>Infrastructure</strong></td>
<td>❌ Complex (vector DB)</td>
<td>✅ Simple</td>
</tr>
<tr>
<td><strong>Context quality</strong></td>
<td>❌ Fragmented chunks</td>
<td>✅ Full sections</td>
</tr>
<tr>
<td><strong>Document types</strong></td>
<td>✅ Any format</td>
<td>⚠️ Structured docs work best</td>
</tr>
<tr>
<td><strong>Multi-step reasoning</strong></td>
<td>❌ Not supported</td>
<td>✅ Built-in</td>
</tr>
</tbody></table>
<h3>Rule of Thumb</h3>
<ul>
<li><strong>Fast &amp; large-scale?</strong> → Traditional RAG</li>
<li><strong>Accurate &amp; structured?</strong> → Vectorless RAG</li>
</ul>
<p>A slightly slower, accurate answer beats a fast, wrong one — especially in legal, medical, compliance, or technical domains.</p>
<hr />
<h2>Conclusion</h2>
<p>RAG opened the door for AI to answer questions about new information. But chunking and vector search create real challenges that limit accuracy in high-stakes situations.</p>
<p>Vectorless RAG bets on reasoning over retrieval — and for structured documents, that bet pays off. It delivers full-context answers with simpler infrastructure and less ongoing maintenance.</p>
<p>The future of AI retrieval may not be in bigger vector databases — <strong>it may be in smarter navigation and reasoning.</strong></p>
<hr />
<p><em>Found this helpful? Share it with someone building AI systems. Questions or thoughts? Drop a comment below.</em></p>
]]></content:encoded></item></channel></rss>