System Designintermediate

Designing TinyURL: A URL Shortening Service

A comprehensive system design case study for building a URL shortening service like Bitly or TinyURL, covering requirements, architecture, database design, and scaling strategies.

February 10, 20244 min read
system-designurl-shortenerdistributed-systemsdatabasescaching

Problem Statement

Design a URL shortening service that takes long URLs and generates short, unique aliases. When users access the short URL, they should be redirected to the original URL. The system should be highly available, support analytics, and handle billions of redirections.

Requirements

Functional Requirements

  • Given a long URL, generate a unique short URL
  • When a user accesses a short URL, redirect to the original URL
  • Users can optionally choose a custom short alias
  • Links should expire after a configurable time period
  • Track click analytics (count, referrer, geo, device)

Non-Functional Requirements

  • Availability: 99.99% uptime — redirects must work even during partial failures
  • Latency: Redirect should complete in under 50ms (p99)
  • Scale: 100M new URLs per month, 10B redirects per month (100:1 read-write ratio)
  • Durability: URLs must not be lost once created

Back-of-the-Envelope Estimation

MetricValue
New URLs per second~40 (100M / 30 days / 86400 sec)
Redirects per second~4,000 (10B / 30 days / 86400 sec)
Storage per URL~500 bytes (short code + long URL + metadata)
Storage per year~600 GB (100M × 12 × 500 bytes)
Cache size (20% hot URLs)~120 GB

Architecture

The system consists of four main components:

1. URL Shortening Service (Write Path)

Receives long URL → generates short code → stores mapping → returns short URL.

2. Redirect Service (Read Path)

Receives short URL → looks up mapping → returns HTTP redirect.

3. Analytics Service

Asynchronously processes click events for reporting.

4. Cleanup Service

Background job that removes expired URLs.

Database Design

URL Mapping Table

CREATE TABLE url_mappings (
    short_code  VARCHAR(7) PRIMARY KEY,
    long_url    TEXT NOT NULL,
    user_id     BIGINT,
    created_at  TIMESTAMP DEFAULT NOW(),
    expires_at  TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

Database Choice

PostgreSQL for the primary store with Redis as a cache layer.

  • PostgreSQL handles the write workload and provides ACID guarantees for URL creation
  • Redis caches the read-heavy redirect lookups with a 24-hour TTL
  • With a 100:1 read-write ratio, Redis absorbs ~99% of read traffic

Short Code Generation

Approach: Base62 Encoding of Unique IDs

Using a distributed ID generator (similar to Twitter Snowflake):

  1. Generate a globally unique 64-bit ID
  2. Convert to base62 (a-z, A-Z, 0-9) to produce a 7-character string
  3. 62^7 = ~3.5 trillion possible codes — sufficient for decades of growth

Why Not MD5/SHA256 Hashing?

  • Hash collisions require detection and retry logic
  • Truncating hashes increases collision probability
  • Base62 encoding of unique IDs guarantees uniqueness without collision handling

Scaling Strategies

Read Scaling

  • Redis cluster: Partition the cache across multiple Redis nodes using consistent hashing
  • CDN: For extremely popular URLs, serve redirects from edge locations
  • Read replicas: PostgreSQL read replicas for cache misses

Write Scaling

  • Database sharding: Shard by short_code hash for even distribution
  • Write-behind caching: Buffer writes in Redis and flush to PostgreSQL in batches

Caching Strategy

Use a read-through cache pattern:

  1. Client requests redirect for short code
  2. Check Redis cache
  3. Cache hit → return URL (99% of requests)
  4. Cache miss → query PostgreSQL → populate cache → return URL

Cache invalidation: TTL-based (24 hours) with active invalidation on URL deletion.

Failure Handling

  • Database failure: Redis cache continues serving redirects for cached URLs. New URL creation returns 503.
  • Redis failure: Fall back to PostgreSQL directly. Latency increases but service remains available.
  • ID generator failure: Pre-generate and buffer IDs. Each application server holds a pool of 1,000 pre-generated IDs.

Cost Estimation

For 10B redirects/month:

ComponentMonthly Cost
3x Application servers (c5.xlarge)~$450
PostgreSQL RDS (r5.large)~$200
Redis ElastiCache (r5.large, 2 nodes)~$400
Load Balancer~$50
Data transfer~$100
Total~$1,200/month

Key Tradeoffs

  1. Consistency vs. Performance: We accept eventual consistency for analytics (click counts may lag) but require strong consistency for URL creation (no duplicate short codes)
  2. Storage vs. Computation: Pre-computing and storing short codes is cheaper than computing them on every request
  3. Simplicity vs. Features: Starting with 301 redirects is simpler but prevents accurate click tracking; 302 redirects enable analytics at the cost of the client not caching the redirect