Sadalage P., Fowler M., NoSQL Distilled (2013)
NoSQL is not "go-to" choice for every project. While relational databases are largely similar, NoSQL has different types each designed for specific purpose. The book is a right sized intro to when and why NoSQL databases should be selected.
Summary
NoSQL Types
There are many NoSQL databases for different purposes. Here is the quick overview. For larger review see 1 and 2
[HINT] Add the SVG Navigator extension to your Chrome to comfortably view the diagrams with zoom and pan in the separate tab.
Purpose and Use Cases
For details refer to "Part II. Implement" (p. 79).
Database Type | Purpose | Use Cases | Data Storage | Processing |
---|---|---|---|---|
Key-Value Store | Ultra-fast reads/writes for simple data models | Caching (Redis), session storage (DynamoDB), real-time leaderboards | Stores data as unique key → value pairs (e.g., user123: {name: ce"} ) |
Optimized for O(1) lookup by key; no complex queries |
Not to Use: | Relationships among data, multi-operation transactions, query by data, operations by sets | |||
Document Databases | Flexible schema for hierarchical, JSON-like data | User profiles (MongoDB), content management, IoT device logs | Stores self-contained documents (e.g., JSON/BSON) with nested structures | Query via document fields (e.g., db.users.find({age: {$gt: 30}}) ) |
Not to Use: | Complex transactions, querying varying aggregate structure | |||
Column-Family Store | Scalable writes/reads for large, sparse datasets | Time-series data (Cassandra), analytics (Bigtable), audit logs | Data grouped in column families (rows with dynamic columns) | Efficient column scans and partition-level queries (e.by timestamp) |
Not to Use: | Write/read ACID transactions, early prototypes or initial tech spikes | |||
Graph Databases | Traverse complex relationships efficiently | Social networks (Neo4j), fraud detection, recommendation engines | Stores **nodes** (entities) + **edges** (relationships) with properties | Uses graph algorithms (e.g., shortest path, PageRank) via queries like er |
Not to Use: | Update all or a subset of entities |
Strengths vs Tradeoffs
Each type excels in its niche but struggles outside it (e.g., Graph DBs are poor for simple key lookups).
Database Type | Strengths | Tradeoffs |
---|---|---|
Key-Value | Speed | Complexity (no queries beyond key lookup) |
Document | Flexibility | Relationships (joins are manual/inefficient) |
Column | Scale | Fixed schema (columns vary by row) |
Graph | Relationships | Isolated data (slow for non-graph queries) |
Data Model: Aggregates
Relational databases organize data without an explicit aggregate concept 1. In contrast, key-value, document, and column-family NoSQL databases are explicitly designed around the concept of aggregates (Evans DDD, p. 125), which facilitates distributed computing on clusters.
Different NoSQL data models prescribe these strengths and tradeoffs (p. 25).
- Aggregate-oriented databases make inter-aggregate relationships more difficult to handle than intra-aggregate relationships.
- Graph databases organize data into node and edge graphs; they work best for data that has complex relationship structures.
- Schemaless databases allow you to freely add fields to records, but there is usually an implicit schema expected by users of the data.
- Aggregate-oriented databases often compute materialized views to provide data organized differently from their primary aggregates. This is often done with map-reduce computations.
Need for Distributed Storage
NoSQL gained popularity primarily by enabling database operation across large clusters. As data grows, scaling up (upgrading servers) becomes costly and impractical. Scaling out (distributing across multiple servers) offers a better solution. Aggregate-oriented NoSQL models align perfectly with this approach, as aggregates (p. 13) serve as natural distribution units.
Distribution Models
(p. 37)
- Single Server
- Sharding
- Replication
- Master-Slave
- Peer-to-Peer
- Combining Sharding and Replication
Data Consistency
(p. 47)
NoSQL data, due to its distributed nature, is a subject of data consistency issue across replicas. Reading from any replica should ensure the read value is the lates actual one. Here are the different basic forms and approaches to consistency:
- Update Consistency
- Read Consistency
- Relaxing Consistency
- Relaxing Durability
- Quorums
Key Points
- Write-write conflicts occur when two clients try to write the same data at the same time. Read-write conflicts occur when one client reads inconsistent data in the middle of another client’s write.
- Pessimistic approaches lock data records to prevent conflicts. Optimistic approaches detect conflicts and fix them.
- Distributed systems see read-write conflicts due to some nodes having received updates while other nodes have not. Eventual consistency means that at some point the system will become consistent once all the writes have propagated to all the nodes.
- Clients usually want read-your-writes consistency, which means a client can write and then immediately read the new value. This can be difficult if the read and the write happen on different nodes.
- To get good consistency, you need to involve many nodes in data operations, but this increases latency. So you often have to trade off consistency versus latency.
- The CAP theorem states that if you get a network partition, you have to trade off availability of data versus consistency.
- Durability can also be traded off against latency, particularly if you want to survive failures with replicated data.
- You do not need to contact all replicants to preserve strong consistency with replication; you just need a large enough quorum
The Rest
These are the basics to begin with. The following chapters provide additional details and are left for self-study:
- Chapter 6: Version Stamps
- Chapter 7: Map-Reduce
- Chapter 12: Schema Migrations
- Chapter 13: Polyglot Persistence
- Chapter 14: Beyond NoSQL
- Chapter 15: Choosing Your Database
Footnotes
-
Nevertheless, in relational databases, the desired application aggregates should be modeled upfront to enable effective modeling of application data relationships. This ensures that aggregates can be conveniently constructed via SQL queries. Meanwhile, NoSQL databases essentially contain pre-built aggregates, unlike the separate fields found in RDBMS tables. This naturally limits the querying capabilities of NoSQL databases, focusing usages on more static data shapes than RDBMS would allow. ↩