Choosing right database for service

Introduction

Selecting the appropriate database for small applications may be uncomplicated, but as the application expands, the task becomes more challenging. Each database has its own advantages and disadvantages, careful consideration of various tradeoffs is imperative when making a database selection.

Common Questions

Here are some of the common tradeoffs and questions to ask yourself when choosing a database:

Data Structure
- Is it a simple key-value pair, document, graph, or relational?
Query Patterns
- How complex are your query patterns?
- Do you just need retrieval by key, or also by various other parameters?
- Do you also need fuzzy search on the data?
- Do you need to do joins between different data sets?
Consistency
- Do you need strong consistency (read after write), or is eventual consistency acceptable?
Storage Capacity
- How much data do you need to store?
Performance
- How much throughput do you need?
- How much latency can you tolerate?
Maturity
- Is the database mature and well supported? Is battle tested?
Cost
- Managed or self-hosted?
- If managed how much does it cost to support your use case?
- If self-hosted, how much hardware do you need to support your use case? Does DBA expertise exist in your organization?

Databases

MySQL Database

When to use?
- To store both relational tables (if you know the schema upfront) and JSON documents (schemaless).
- To optimize on writes instead of reads, to have a strong read consistency.
Advantages:
- MySQL highly battle tested and mature.
- Fast read performance.
- Support for JSON/Document.
- Cross data center write consistency using ProxySQL.
Disadvantages:
- Does not scale horizontally due to limited by amount of disk space.
- Not good for fuzzy search.

MongoDB

When to use?
- If data schema is predicted to keep changing and evolving.
- To store dynamic JSON documents.
- Data denormalization is not required. Its ok to have eventual consistency.
Advantages:
- Schemaless and flexible.
- Easy to scale using sharding.
Disadvantages:
- High memory required due to a lot of denormalized data is kept in memory.
- Document size is limited to 16MB.
- Not good replication strategy available for cross data center.
- Consistency issues on traffic switch to another data center.