System Design Fundamentals

Resource: Neetcode — System Design for Beginners, Background Videos 0–2

These notes cover the foundational concepts required before diving into designing real-world systems at scale. Think of this as the vocabulary and mental models you need before tackling anything more complex.


Contents

  1. Scaling Systems
  2. The Three Core Design Operations
  3. Availability & Uptime
  4. Reliability, Fault Tolerance & Redundancy
  5. Throughput & Latency

1. Scaling Systems

As a product grows in popularity, its underlying system must handle increasingly more traffic. There are two primary strategies for scaling a system: vertical and horizontal scaling.

Vertical Scaling (Scale Up)

Vertical scaling means upgrading the hardware of a single existing server — adding more CPU cores, more RAM, a faster network card, or a faster storage drive.

Pros:

  • Simple to implement — no changes to application code required
  • No network overhead between multiple machines

Cons:

  • Hard physical ceiling — there's only so powerful a single machine can get
  • Creates a single point of failure: if that one server goes down, service is completely unavailable
  • Expensive at higher tiers with diminishing returns

Horizontal Scaling (Scale Out)

Horizontal scaling means adding more servers to share the load. A load balancer sits in front of those servers and routes each incoming request to whichever server currently has the smallest load.

Pros:

  • Virtually unlimited capacity — just keep adding servers
  • Adds redundancy: if one server fails, the others keep serving traffic
  • Enables geographic distribution: deploy servers near users to reduce latency

Cons:

  • More complex infrastructure and potentially more complex application code
  • Requires a load balancer (another component to manage)
  • Data consistency challenges across servers (sessions, caches, writes)
Rendering diagram…
Horizontal scaling — a load balancer distributes traffic across multiple servers; more servers can be added at any time

💡Why Horizontal Scaling Wins

Horizontal scaling is almost always preferred in production systems. It not only increases total capacity but also adds redundancy — if Server 1 crashes, Servers 2 and 3 keep serving users without interruption. Vertical scaling creates a single point of failure: one very powerful machine going down means complete outage.

Load Balancers

A load balancer is a program (or dedicated service) that:

  • Accepts all incoming requests from users
  • Tracks the current load (number of active requests) on each backend server
  • Forwards each new request to the server with the smallest current load
  • Performs health checks and stops routing to unhealthy/down servers

Popular load balancers: Nginx, HAProxy, AWS Elastic Load Balancing (ELB), Google Cloud Load Balancing.


2. The Three Core Design Operations

Every system — regardless of complexity — can be described in terms of three fundamental operations:

Rendering diagram…
Every system design problem reduces to some combination of moving, storing, and transforming data

Move Data

Moving data means transmitting information from one place to another — between processes on the same machine, between computers on a local network, or across the internet between a client and a server.

Core protocols: HTTP/HTTPS (request-response), WebSockets (persistent bi-directional), TCP (reliable), UDP (fast, lossy).

Store Data

Data storage is about persisting information durably so it can be retrieved later.

| Storage Type | Description | Best For | |---|---|---| | SQL Database | Structured, relational data with ACID guarantees | Transactions, user accounts, orders | | NoSQL Database | Flexible schemas, built for horizontal scale | Unstructured, high-volume, or graph data | | Blob Store | Unstructured binary objects stored by key | Images, videos, files, backups | | Data Structures | In-memory BSTs, arrays, hash maps | Caching, search indices, leaderboards |

What is a Blob Store?

A Blob Store (Binary Large Object Store) is designed for unstructured data — files like .mp4, .jpg, .pdf, and .txt. Key characteristics:

  • No file hierarchy — a flat namespace where each object has a unique key
  • Cannot be queried or searched like a database
  • Designed to scale to petabytes of data at low cost
  • Optimized for large sequential reads and writes

Examples: Amazon S3, Google Cloud Storage, Azure Blob Storage.

Transform Data

Transforming data means processing raw inputs into useful, derived outputs:

  • Logs: Time-series records of system events (every log entry has a timestamp). Used for debugging, auditing, and tracing.
  • Metrics: Aggregate measurements over time — average response time, error rate, CPU utilization.
  • Analytics: Higher-level business insights derived from data — "15% of users click this button", "revenue is up 8% this week".

3. Availability & Uptime

Availability is the percentage of time a system is operational and accessible. It is one of the most important requirements to nail down early in any system design discussion.

The "Nines" of Availability

Availability is commonly expressed as a count of "nines" — the number of 9s in the percentage:

| Uptime | Downtime / Year | Downtime / Month | Common Name | |---|---|---|---| | 99% | 3.65 days | 7.2 hours | Two nines | | 99.9% | 8.77 hours | 43.8 minutes | Three nines | | 99.99% | 52.6 minutes | 4.38 minutes | Four nines | | 99.999% | 5.26 minutes | 26.3 seconds | Five nines | | 99.9999% | 31.6 seconds | 2.6 seconds | Six nines |

⚠️Diminishing Returns on Uptime

Going from 99% → 99.9% uptime can cost as much as going from 95% → 99%. Each additional "nine" requires exponentially more engineering investment. Before demanding high availability, ask: "How many years of revenue saved from reduced downtime would it take to cover the cost of reaching the next tier?"

📝Questions to Ask About Availability Requirements

  • How much does each hour of downtime actually cost the business (revenue, reputation)?
  • What SLA (Service Level Agreement) do customers expect or have contractually?
  • Is the system mission-critical (healthcare, finance) or best-effort (content streaming)?
  • Are there scheduled maintenance windows that count against uptime?

🔑Design Mistakes Are Expensive to Fix Later

System design decisions made early — before a product is at scale — are very hard to reverse once the system is deployed and serving millions of users. Getting the architecture right before launch is critical. Retrofitting a poorly-chosen database, splitting a monolith into microservices, or adding geographic redundancy after the fact are all extremely costly and risky.


4. Reliability, Fault Tolerance & Redundancy

These three concepts are closely related and work together to keep systems available even when individual components fail.

Reliability

Reliability is the probability that a system performs its intended function correctly without failure over a given time period.

  • A single server with no redundancy is maximally unreliable — one hardware fault and the service is gone
  • Adding more servers (horizontal scaling) increases total request capacity, which also increases overall reliability
  • Reliability is not just about uptime — it also includes correctness: the system should return accurate results, not just stay online

Fault Tolerance

Fault tolerance is the ability of a system to continue functioning correctly even when one or more of its components fail.

  • Multiple servers: if Server 1 crashes, Servers 2 and 3 continue serving users
  • The service degrades gracefully rather than going completely offline
  • Requires health checks, circuit breakers, and automatic failover mechanisms

Redundancy

Redundancy means having backup capacity that activates when primary capacity fails. This includes:

  • Server redundancy: multiple identical servers behind a load balancer
  • Database replication: primary + replica databases; if the primary fails, a replica is promoted
  • Geographic redundancy: servers in multiple data centers/regions
Rendering diagram…
Geographic redundancy — users are routed to the nearest region; if one region fails, DNS redirects traffic to another

ℹ️Geographic Distribution and DNS Routing

DNS can return different IP addresses based on the user's location, routing them to the nearest server. This is how CDNs (Content Delivery Networks) and multi-region deployments achieve low global latency. If the Vancouver server goes down, DNS automatically starts routing West Coast users to Los Angeles instead.


5. Throughput & Latency

Throughput

Throughput is the number of operations a system can handle per unit of time. It's commonly measured as:

  • Requests per second (RPS) — for web/API servers
  • Queries per second (QPS) — for databases
  • Bytes per second (MB/s or Gbps) — for network or storage throughput

Horizontal scaling is the most effective way to increase throughput because each additional server adds roughly linear capacity. Vertical scaling has a hard ceiling and yields diminishing returns at higher tiers.

📝Database Throughput

When designing a system, always consider data transfer capacity and processing capacity per second separately. A high-throughput web tier can still be bottlenecked by a low-QPS database. This is why database design — indexing, sharding, caching — is a core part of system design.

Latency

Latency is the total time it takes to complete an operation from start to finish, measured as end-to-end time.

For example: a user clicks a button at t=0 and receives a response at t=1slatency = 1 second.

Factors That Affect Latency

  1. Geographic distance — the speed of light is a real constraint. Data traveling across continents adds 50–150ms of unavoidable latency.
  2. Network hops — each router or switch between client and server adds a small delay.
  3. Server processing time — CPU computation, database queries, external API calls.
  4. Storage tier — where the data lives in the hardware hierarchy (see below).

Hardware Latency Hierarchy

The type of storage used to retrieve data has a massive impact on latency. Each tier in the hardware hierarchy is orders of magnitude slower than the one above it:

Rendering diagram…
Hardware latency hierarchy — each layer is orders of magnitude slower than the one above; cache is ~10 million times faster than HDD

| Storage Tier | Typical Latency | Notes | |---|---|---| | CPU Cache (L1/L2/L3) | ~1 nanosecond | Stored on the chip itself | | RAM (Memory) | ~100 nanoseconds | Fast but volatile (lost on power off) | | SSD (Flash Storage) | ~100 microseconds | Persistent, ~1,000× slower than RAM | | HDD (Spinning Disk) | ~10 milliseconds | Cheapest per GB, but ~100× slower than SSD |

💡Caching = Trading Storage Cost for Lower Latency

A fundamental system design pattern is caching — storing frequently accessed data in a faster storage tier so subsequent requests skip the slow path entirely. For example, caching database query results in Redis (in-memory) means reads that would take 10ms from disk now take under 1ms. This can be the difference between a snappy and a sluggish user experience.


Summary

| Concept | Key Takeaway | |---|---| | Vertical Scaling | Upgrade one server — simple but limited and a single point of failure | | Horizontal Scaling | Add more servers + load balancer — scalable, redundant, preferred | | Move / Store / Transform | The three fundamental operations in every system | | Blob Store | Unstructured binary storage (S3-style) — scales massively, not queryable | | Availability | Measured in "nines" — each extra nine costs exponentially more | | Reliability | Probability the system works correctly over time | | Fault Tolerance | Continues functioning when components fail | | Redundancy | Backup capacity + geographic distribution | | Throughput | Operations per second — scale horizontally to increase | | Latency | End-to-end time — minimize with caching and geographic proximity |