A couple of days ago I tweeted about the CAP theorem which sparked some interest and a small conversation on Twitter. Since tweets tend to disappear in the void pretty quickly, I thought I also put this here:
1/ The CAP theorem in distributed systems states that you cannot satisfy more than two of these three properties:
- Partition Tolerance
But here's what's commonly misunderstood about the CAP theorem:
2/ Being Partition Tolerant or not is not really a choice. In distributed systems, partitions may occur at any time. It’s more of a choice what to give up during a network partition: Consistency or Availability.
3/ Giving up Availability implies a part of your system will become unresponsive or return an error during a network partition.
4/ Giving up Consistency implies your system might return stale data (=dirty reads) because data was written to some nodes but not yet to all due to a network partition.
5/ Consistency according to CAP is Linearizability – not what the ‘C’ in ACID means. Linearizability enforces a linear order to operations, e.g. read(key) should return the latest written value of key.
6/ Consistency in A[C]ID is ensured through Serializability which is a mechanism to ensure concurrent transactions in databases can be split up into multiple serial executions. No strict order is required.
7/ Whether a system is actually CP or CA is often very hard to tell. It depends on the configuration of the system, e.g. number of replicas, consensus algorithms, quorum sizes, bugs, etc.
8/ The CAP theorem is a starting point for characterizing systems, but to properly understand guarantees of distributed systems, you have to dive deeper. Here’s an excellent article on this by @martinkl: martin.kleppmann.com/2015/05/11/ple…
Hope you found this useful. Let me know on Twitter if you have any remarks.