TOC
- What is a database?
- Features of Database
- Why Database?
- Different types of databases
- Important concepts
- ACID
- Data partition
- Data sharding
- BASE
What is Database:
A database is a system to efficiently store, retrieve, and search the data. A database can be imagined as an organised way of storing the data of any application.
Features
- Organized
- Availability
- easy to retrieve, search, read, and write data
- Low Latency
- Data recovery
- ACID*
- Flexible Schema*
*not all databases provide it.
Why Database
- to organized the data
- to store the data
- to retrieve the data
- to have a backup of data
- low latency search
Popular database
- RDBMS: MySQL, Oracle, etc.
- NoSQL: MongoDB, DynamoDB
Type of database
- Table
- Table based database consists of the rows and columns.
- Table based database has a primary key to link the data subsets or partition the database
- Best fit for the vertical scaling
- Schema-based
- Document
- The document-based database consists of the collection, documents, and fields
- The data is stored as key-value pair in fields.
- Best fit for the horizontal scaling
- Schema-less
- Graph
- The graph-based database store the data as nodes and relations
- It is schema-less
- Easy to scale
RDBMS vs NoSQL
RDBMS | NoSQL (Not Only SQL) | |
---|---|---|
Example | MySQL, Oracle, Maria, etc. | mongo, couch DB, Dynamo DB |
Design | tables | document, key-value, graphs |
Scaling | vertical | horizontal |
Ease of scaling | hard | easy |
Flexible | not so flexible | very flexible |
Schema | need schema | no need for schema |
Primary key | Yes | no |
DBA | required | only if, DB is big |
Dynamic (doing any change at any time) | no | yes |
ACID | yes | no |
Sharding | no | yes |
Partition | yes | no |
complexity | High | Low |
Language | Query language | MQL (mongo query language) |
Integrated Caching | Needs to implement separately | inbuilt |
Security | Yes | not as MySQL |
Store procedures | Yes | No |
Usecases | Where the ACID properties are required eg: financial banks | Where ACID is not the priority |
Performance | Faster at selecting the number of data | Faster in inserting and updating the data. Writes are faster |
When to use | Best fit for data in suitable for table/rows | Best fit for unstructured data |
Best suitable for the small data set | Best suitable for the large data set | |
- | hight availability during unstable environments | |
Strong dependency on multi-row transactions | - |
Important concepts:
1) Transactions
For any database, we communicate with terms of ‘transactions’. How many transactions are happening?
eg: There are A and B accounts with Rs 500 and Rs. 1000. A will transfer the 200 to the B account and increase the balance of B 1200.
Here, we have 3 steps:
- Account A transfer 200 Rs
- Account B will receive the amount from A
- Account B's balance will increase
The above 3 steps are one transaction.
2) ACID
a. Atomicity
Atomicity means that all the transactions should be either 100% successful or fail. There should not be any partial state. Basically All or none.
eg: In the example of transactions, if 1 step will fail then the whole transaction should be failed. For atomicity, the whole transaction should be 100% done
b. Consistency
The database must be consistent before and after the transaction always. Concurrency should not impact the database integrity.
eg: If a user has deposited 100 in an account and it failed. The database should reflect the last value to keep the database consistency.
c. Isolation
Multiple transactions occur independently without interference. In the end, the database should have the data after all the transactions.
eg: Bank balance is 600. User A withdraws the 100 and user B is adding 50. For user A the balance would be 600 and after withdrawing it would be 500. For user B the balance is 500 and on that 50 will get added and the new balance would be 550. It would look like the transactions are running sequentially.
d. Durability
The changes made by a transaction should persist.
eg: A user has done a transaction of Rs 200. If there is an outage still the transaction success/failure should be persisted in the database. This can be done by the WALs (Write-Ahead-logs).
3) Sharding
Data Sharding is a way of distributing a single database across multiple databases which can be stored across multiple machines. It is a way of reducing the load to a single database.
Data sharding is a kind of ‘Horizontal scaling’. While doing sharding, the data is distributed into multiple machines known as ‘physical shards’. Each shard would be ‘autonomous’ - they don’t share any data or compute resources. Sharding can be implemented at the application level or a few databases supports it natively.
Advantages of sharding:
- Fast search
- Avoiding SPoF
- Easy to upgrade horizontal
The disadvantages of sharding:
- Unbalanced shard
- Not natively supported by many databases
- Shard architecture is hard
- Unsharding is difficult
Techniques for Sharding:
- Key-based
- Range-based
- Directory-based
4) Partition
The data partition is a way of distributing the data into multiple subsets of data under a single instance. MySQL supports the partition. There are different types of partition.
The way to link the different tables together is by the primary key.
Eg: There is data of users. Instead of storing whole data in one table. We can use data-partition to avoid making one table too big, avoid SPoF, and easy maintenance of the database. We can break it into multiple databases.
Benefits:
- Query performance
- Data availability
- Performance
- Data manageability
- Improve security
Data Partition methods:
- Horizontal partition (Database Sharding)
- Vertical partition
- Functional partition
Techniques:
- Range-based
- Hash-based
- List-based
Difference between data-partition and data-sharding.
Data sharding and partition goal is the same to distribute the data. The data partition is the backbone of the distributed system architecture. Where Data-partition focuses on distributing the database into multiple data subsets under one single instance. Data sharding is about distributing the data into multiple machines.
5 BASE
BA: Basically Available
S: Soft State
E: Eventually consistent
Use Cases:
RDBMS
- MySQL will be a great fit for financial transactions where ACID compliance is required
- Data is structured
- Horizontal scaling is not the requirement
- Search is faster
NoSQL
- Where ACID compliance is not the priority. Eg: blog application, e-commerce (product)
- Data is unstructured. Eg: products
- Read and Writes are higher. Eg: Blog where we will be creating blogs and reading them
- Looking for horizontal scaling of Database.
A big thanks to folks who reviewed this and gave valuable feedback: Ahsan, Tauseef, Naman,Anand,Uddeshya
Liked the blog? Follow me on [Twitter],(https://twitter.com/hellonehha)
Top comments (0)