Daily Learning

Learning Journal

Documenting what I'm learning — notes, insights, and progress across distributed systems, databases, and more

Redis
July 1, 2024

Redis Data Structures Under the Hood

Exploring how Redis implements its data structures — from simple dynamic strings to skip lists and hash tables — and why these choices matter for performance.

What I Learned

Redis isn't just a key-value store — it's a data structure server. Understanding the implementation behind each type reveals why certain operations have surprising performance characteristics.

Strings (SDS - Simple Dynamic Strings)

  • Redis doesn't use C strings. SDS pre-allocates space and tracks length, making APPEND O(1) amortized instead of O(n).
  • Binary safe — can store any bytes, not just text.

Hash Tables

  • Redis uses chained hashing with incremental rehashing. When a hash table needs to grow, Redis creates a new table and migrates entries gradually across multiple commands, avoiding a single expensive resize operation.

Sorted Sets (Skip Lists)

  • The most interesting data structure. A skip list provides O(log n) insert, delete, and search — similar to a balanced BST but simpler to implement and more cache-friendly.
  • Redis uses a skip list + hash table combination for sorted sets, enabling both score-based range queries and O(1) member lookups.

Streams

  • Radix tree-based storage for efficient time-series data with consumer groups inspired by Kafka's design.

Connection to My Work

My distributed cache project uses similar patterns. Understanding Redis's approach to incremental rehashing influenced my design for handling hash table resizes without blocking client requests.

Still Learning

I'm currently studying Redis Cluster's gossip protocol and slot migration mechanism for resharding. This connects directly to my distributed cache project's need for online rebalancing.

Source: Redis in Action + Source Code
#redis#data-structures#performance#databases
Kafka
June 15, 2024

Kafka Internals: Log-Structured Storage

Deep dive into how Apache Kafka stores messages using a log-structured architecture with segments, indexes, and compaction.

What I Learned

Kafka's storage model is fundamentally simple yet powerful. Each partition is an append-only log divided into segments. Key insights:

  • Segments: Each partition is split into segment files (default 1GB). Only the active segment accepts writes; older segments are immutable.
  • Index files: Each segment has an offset index mapping logical offsets to physical file positions, enabling O(1) lookups.
  • Zero-copy transfer: Kafka uses sendfile() to transfer data directly from the page cache to the network socket, avoiding user-space copies.
  • Retention: Messages are retained by time (default 7 days) or size. Log compaction retains only the latest value per key.

Connection to My Work

At Oracle, we used Kafka for event-driven communication between OCI Compute services. Understanding the storage internals helped me tune retention policies and diagnose slow consumer issues that turned out to be caused by segment rotation during peak traffic.

Key Insight

Kafka's performance comes from working with the OS, not against it. Sequential disk writes, OS page cache, and zero-copy networking mean that Kafka often outperforms in-memory systems for throughput because it leverages hardware efficiently.

Source: Kafka: The Definitive Guide
#kafka#distributed-systems#storage#architecture