What's an ACID in database?
They are four critical properties of any relational database system.
Its acronym refers to the presence of four properties: Atomicity, Consistency, Isolation, and durability.
In order to understand an ACID concept well, you must know first what is a database transaction
What is Database Trasnaction?
A transaction is a collection of queries that are treated as one unit of work, if one query has failed all other queries must be discarded.
So why treat a collection of queries as one unit of work?
when you write or read from a database several things can go wrong, i.e. you lost your connection to a remote instance, encounter value error, incorrect syntax, timeout expiry, and so many other errors can happen.
so let's look at an example from a bank that client1 wants to transaction 50EGP to client2
- Check if the client1 has more than or equal to 50EGP in his balance.
- Subtract 50EGP from the client1 balance.
- Add 50EGP to the client2 balance.
If something goes wrong in the middle of this operation disaster will occur, if step3 has failed client1 will lose 50EGP and if step2 has failed and step3 success, client1 will get 50EGP from nowhere.
Transaction lifespan
-
Transaction BEGIN: here you tell the database engine: hey I want to open a new transaction to write some series of logically grouped database operations.
-
Transaction COMMIT: here you tell the database to commit your operation and persist it to disk.
-
Transaction ROLLBACK: for any reason you might to cancel these operations you can roll back and the database engine will discard your queries and will not persist them on disk.
The question here how does the database engine does that, does it saves queries on the memory or does it saves it on the disk, and what happen when we commit or rollback
- Every SQL database engine makes it different depending on the engineering team's perspective, of course, there are tradeoffs to consider, for example, PostgreSQL is the fastest regarding committing changes, any query in the transaction they are trying to persist on the disk(they are very optimistic that you are going to commit every transaction), so this comes for a price, PostgreSQL rollback is slow because it must wipe down all data again.
What will happen if database was crashed on rollback or on commit This is how the durability ("D" in ACID) is implemented
All committed transactions are already written to disks (in redo logs). This is how a database guarantees no data loss at the crash. The instance will perform instance recovery at startup. Uncommitted transactions will be rolled back.
Fact
PostgreSQL treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a transaction block.
After we learned about what is transaction, let's begin to dig deep into ACID properties.
1. Atomicity
Remember when we said a transaction is treated as one unit of work and if one query failed all other queries in the transaction must be rollbacked, that is what Atomicity means..
Atomicity is all or nothing — the transaction either happens completely or doesn’t happen at all, literally like an atom was defined back that it can't be split, but an atomic nucleus can be split. Walton and Cockroft were the first ones to do it. The process is referred to as nuclear fission.
The lack of Atomicity leads to inconsistancy.
2. Consistency
Consisteny in transaction means that when you make a changes and commit it, any new transaction must sees these changes.
Also the database must be valid after the transaction, all constraints are enforced for every committed transaction.
All data written to the database must be valid according to predefined rules including constraints, cascades, triggers , and checks;
3. Isolation
So if you have multiple transactions running at the same time(concurrently), can the transaction sees the changes made by other transactions? (it depends on your case).
Here database isolation comes into play, the goal is to prevent reads and writes of temporary, aborted, or otherwise incorrect data written by concurrent transactions.
Database isolation: multiple transactions can happen concurrently without reading the wrong data
Read Phenomena
Concurrent transactions accessing the same resource at the same time can lead to reading issues. To understand database isolation, we need to understand the different read phenomena which may occur during the execution of concurrent transactions. These are:
- Dirty reads
- Non-repeatable (or fuzzy) reads
- Phantom reads
Dirty read
Dirty read when one transaction reads uncommitted changes from other running transactions.
This happens because there is no locking preventing it.
Non-Repeatable read
If you’ve got two consecutive reads in one transaction with a concurrent update in between, those reads are going to show different results even though they’re part of the same transaction. This is undesirable since we end up using stale data. This is prevented by holding a shared lock (read lock) on the read record for the whole duration of the current transaction.
Database engines solve that behave differently, PostgreSQL made a version from the client1 state, PostgreSQL creates a new version on any update and never changes the same value. while MySQL and oracle made it different they changes the actual value in the client table and maintain another table called undo table that stores the previous value.
Phantom read
Phantom reads happen when one transaction read data and then a concurrent transaction inserts a row in that table that will affect the next read for the first transaction and the first read it will be stale.
Isolation Levels:
The SQL standard defines number of Isolation levels to solves read phenomena issues:
- READ_UNCOMMITTED
- READ_COMMITTED
- REPEATABLE_READ
- SERIALIZABLE
- SNAPSHOT
You set these isolation levels when you began your transaction.
READ_UNCOMMITTED
No isolation, any change from outside that is committed or not is visable to current transaction. e.g. Dirty Read.
READ_COMMITTED
Only sees data that has been committed by other transaction.
REPEATABLE_READ
The transaction will sure that if i read something twice it will be the same with the transaction, that solves the unrepeatble read problem in read phonemena. but of course this comes for price.
SERIALIZABLE
Only one transaction will run at a time (no concurreny)
SNAPSHOT
It's like if we take a snapshot of the database at the moment we start the transaction, each query will sees changes that committed up to start of the transaction.
Every DBMS implements isolation levels differently
DATABASE | DEFAULT ISOLATION LEVEL |
---|---|
Oracle | READ_COMMITTED |
MySQL | REPEATABLE_READ |
Microsoft SQL Server | READ_COMMITTED |
PostgreSQL | READ_COMMITTED |
Isolation levels vs Read phenomena
READ_UNCOMMITTED | READ_COMMITTED | REPEATABLE_READ | SERIALIZABLE | |
---|---|---|---|---|
Dirty reads | allowed | prevented | prevented | prevented |
Non-Repeatable read | allowed | allowed | prevented | prevented |
Phantom read | allowed | allowed | allowed | prevented |
Lost update | allowed | allowed | prevented | prevented |
4. Durability
When a transaction is committed, the changes made in this transaction must be persisted on non-volatile storage. even if the system is shut down when committing, durability must guarantee that all changes that were made in the transaction are indelible.
The durability is achieved by writing these changes to a series of log files before the data operation is committed to the database file, so the database will read these log files at startup and decide whether to continue the transaction or rollback to any previous changes.