Types of Databases & When to Use Each One
According to the latest estimates, 328.77 million terabytes of data are created every single day. Data from customer transactions, internal operations, and all kinds of communications arrives at a business’s doorstep 24/7.
Most organizations find that the best way to manage tons of incoming data is in a database. However, just as data types can vary significantly, so can databases. You need at least a basic understanding of what a database is and how different types of databases are tailored to manage different data characteristics. Ultimately, choosing the right database type for your data requires a thorough understanding of what your data needs.
This article explores the different types of databases, provides examples of the most common types of databases, and discusses when to choose each type of database.
What is a database?
A database is an organized collection of structured information (data), stored electronically and managed by a database management system (DBMS).
Databases are an offshoot of the early electronic spreadsheets. When personal computers first came to market in the late 1970’s, one of the first office applications was an electronic spreadsheet. Small businesses loved these spreadsheets, because they enabled businesses to post once and use many, easily generating statistics and reports based on a single copy of their data. However, these spreadsheets were still single-user applications, capable of relatively simple data manipulation, and limited in how much data could be stored.
Electronic databases are designed to make it easy to add and manipulate much more data than a typical spreadsheet ever could. At its core, a database is an organized collection of structured data stored and managed on a computer system. It provides features that make it easy for users to input, retrieve, update, and delete data. Organizations use database software to ensure data integrity and scalability.
Key components of a database
Data model
The data model is a conceptual representation of the data, its entities, and relationships, used to organize and structure the data in a database. The goal of data modeling is to identify the types of data used and stored within the system, the relationships between these data types, the way the data is grouped and organized, and its formats and attributes. This enables the organization to create a common, consistent, and predictable way of managing data resources.
An important part of data model design is assigning “keys” as needed. A key can be used to link data from different tables or data collections, that shows a common relationship. For example, a customer might be assigned a key that can then be linked to their address and their order history, eliminating the need to repeat this information in the table of customer names. This cuts down on repetition within the database, which not only saves room but also provides fewer opportunities for error.
Database management system (DBMS)
A DBMS is software that acts as an intermediary between users or applications, and the physical storage of data. The DBMS provides tools and utilities to manipulate and manage the data and ensure it stays secure.
The interface between the end user and the database is provided via a database query language. The query language provides ways to insert new data, update existing data, delete data, merge data (insert new rows while updating existing rows).
Schema
The schema is a comprehensive “blueprint” that describes the complete logical structure and organization of data within a DBMS. By defining the architecture of your particular database, the schema provides:
- Effective querying capabilities for data engineers
- Overall governance of database policies and standards
- Accurate access control administration by database managers
- A roadmap for extending database features and functions
Tables or collections
Tables (in relational databases) or collections (in NoSQL databases) hold the actual data in a structured format that enables users to work with more than one table or collection at a time. Each table consists of rows (records) and columns (fields) that define the attributes of the stored data.
Queries
Queries are commands or statements used to extract information from the database, based on the specified conditions or criteria. In relational databases, queries are written in SQL; in NoSQL databases, queries can be written in other query languages.
It is possible to use a query to create a virtual table, or view, based on the result set of a query.
Indexes
Indexes are data structures that optimize the retrieval of data from a database. They work just like a book index, by enabling the database to find data quickly without scanning the entire table. However, because databases are dynamic, entire databases cannot be indexed. Instead, users can index tables to help their current work, on a per-table, per-session basis.
The syntax for index creation usually involves asking the database to CREATE INDEX, specifying a name for the index, the associated table, and the table column to be used as the index key. In fact, it’s possible to have more than one index per table, as long as each index uses a different column as a key
Transactions
Transactions group multiple database operations into a single logical unit, such that if all the transactions aren’t completed successfully, none of them are applied to the database. This protects the data in environments where multiple users might be accessing the data at the same time.
Data security and access control
Data security and access control define roles and privileges to ensure that only authorized users can read, write (add), or update data. This creates a hierarchy of data “roles” that account for everyone from people who have no need to even see the data, to the people who maintain the database structures.
Data backup and recovery
Database management software includes the means to backup and recover data, to guard against the loss of data in the event of a hardware failure and the corruption of data through software bugs or human errors. They also act as insurance against losing data if a system migrates to different hardware.
Backup strategies include full backups, incremental backups, and differential backups. Recovery mechanisms such as point-in-time recovery and transaction logs enable the database to recover to a consistent state after a failure.
Types of databases
Modern databases can handle a wide array of data, providing powerful tools for data manipulation, analysis, and visualization. The following sections provide a general overview of the types of databases currently available, and what they’re good for.
Relational databases
Relational databases are the oldest type of database. These databases are laid out similarly to spreadsheets, with rows for each entry and columns that provide attributes of that entry. For example, a row for the user John Doe might have columns for Name, Address, Telephone, etc. Data is organized as a set of tables, which can be joined together via a primary key (a unique identifier for each record) or a foreign key (a key that references a primary key in another table).
Users interact with relational databases using queries written in Structured Query Language (SQL)
Key features:
- ACID compliant: The ACID standard (Atomicity, Consistency, Isolation, and Durability) ensures that all database transactions are processed and reconciled before changes to the database are written. This reduces the chance of collisions between users working on the same data.
- Range of data types: These databases can store a wide range of data types and support complex queries.
- Collaborative: More than one user can access a database project at the same time.
- Secure: Access is restricted through the use of user permissions.
- Stable: These databases are well-understood and well-documented.
Common use cases:
- Online transaction systems, which support many users and very frequent queries.
- The Internet of Things (IoT, or edge computing)
- Data warehouses, where the critical components are storage, optimization for massive queries, and easy integration.
The most popular relational databases include Oracle, MySQL, Microsoft SQL Server, ProsgreSQL, IBM Db2, SQLite, Microsoft Access, MariaDB, Apache Hive, and Microsoft Azure SQL Database.
Non-relational (NoSQL) databases
Non-relational databases are built for use with massive amounts of data and variable data structures. Thus, they model data relationships and store data differently from relational databases. They may be column-based, key-value based, graph-based, or document-based.
Users interact with non-relational databases using queries written in languages other than SQL. (NoSQL means “Not Only SQL”.)
Key features:
- Flexible: NoSQL databases can easily handle a mix of structured, semi-structured, and unstructured data.
- Scalable and responsive: Massive data storage scales well with on-demand servers and provides quick query responses.
- High availability for minimal downtime due to near real-time data replication.
- Cloud compatible: Non-relational databases work well in a highly scalable cloud architecture.
- Support for multiple data structures: Different information types and multi-model database formats are available.
Common use cases:
- Real-time systems, which benefit from the ability of non-relational architecture to combine operational and analytical database systems into one.
- Personalized database experience: Elastic scaling accommodates the massive amounts of data needed for any customized experience.
- Fraud detection: High performance is vital in fraud detection. Non-relational databases are responsive and reliably meet the low latency requirements of financial systems.
The most popular NoSQL databases include MongoDB, Redis, Cassandra, HBase, Neo4j, Oracle NoSQL, RavenDB, Riak, OrientDB, and CouchDB.
Object-oriented databases (OOD)
Object-oriented databases combine object-oriented programming (OOP) concepts with database capabilities. They model object data using four critical components not seen in SQL and NoSQL databases:
- Objects are reusable blueprints for individual physical or non-physical entities with descriptive and functional elements. Although similar objects may share certain characteristics, each object is unique.
- Classes are schemas that define object behavior.
- Methods describe the behavior of a class of objects.
- Pointers help access elements of an object database and establish relations between objects.
Object-oriented databases view each piece of data as a complete object. Instead of multiple tables, all information is packaged as one instantly available object package. Users interact with object data using the OOD’s API.
Key features:
- ACID compliant: As with SQL databases, Object databases follow the ACID standard (Atomicity, Consistency, Isolation, and Durability). This reduces the chance of collisions between users working on the same data.
- Complex and custom data types: User-defined classes allow for both custom and complex data types to coexist in the same OOD.
- Accessible data: Easy to save and retrieve.
- Easier modeling of real-world problems through the use of user-defined objects and classes.
Common use cases:
OODs perform best with complex data types, where one entity includes a massive amount of information. Some everyday use cases for this database model type are:
- High-performance applications where fast data retrieval is vital benefit from object databases, since data is stored and retrieved as-is.
- Scientific data and calculations are complex, and benefit from the generous storage and quick retrieval of OODs.
- Complex data structures: An OOD works seamlessly with complex data structures, eliminating the need to rework the database model when storage needs to expand.
The most popular object-oriented databases include DB4o, ObjectStore, Matisse, Gemstone/S, ObjectDB, ObjectDatabase++, Objectivity/DB, Versant, Perst, and Jade.
Network databases
Network databases represent data in a graph-like structure, visually connecting records via pointers. In this model, data is organized into nodes (representing entities or records) and edges (representing the relationship between nodes).
Network databases are well-suited to handling complex relationships and queries. Since nodes can have multiple connections to other nodes, the relationships between them can get quite sophisticated.
Key features:
- Visually distinct presentation: A network database’s non-hierarchal structure of nodes and edges creates a visually distinct presentation. When records are closely related, they may be grouped into sets, creating a visual aggregation that is obvious to the viewer.
- Facility for complex relationships and queries: Since nodes can have multiple connections, data retrieval is faster, and more flexible and intuitive, than hierarchically organized database.
Keep in mind that network databases also have some drawbacks:
- Complex to design, implement, and maintain
- Lack of standardization and use of proprietary query languages may make it difficult to scale
- Data redundancy is more likely due to the lack of data normalization
- Limited support and tools
- Performance concerns due to the time it takes queries to traverse multiple nodes and relationships
The most popular network databases include IDMS, IMS, UNIDATA/UniVerse, Relex, and Model 204.
Cloud databases
A cloud database is any style database hosted in the cloud, rather than on-premises. The database is hosted by a third-party provider, with access provided in a pay-as-you go basis. This creates a business model called Database as a Service (DaaS). It is an agile approach that minimizes the initial investment to capitalize a data center and makes it simpler to expand as more resources are required.
Key features:
- Scalability in the cloud enables a company to add much more data without affecting the database’s performance. Some cloud databases scale horizontally (adding more of the same type of processing), which is more economical than vertical scaling (upgrading hardware to boost capacity).
- Management flexibility: The provider manages the database, which minimizes the management needed from the client. You might also have the option to outsource database maintenance.
- Cost: With a cloud database, you pay only for what you need. The cost of investing in technical staff is minimized.
The most popular cloud database hosting services include Amazon RDS, Microsoft Azure SQL Database, Google Cloud SQL, IBM Db2 on Cloud, and Oracle Database.
Choosing a database
The following table summarizes the information we just presented to help you choose a database based on your organization’s priorities.
Use case
|
Database Type
|
Complex data structures
|
OOD
|
Cost
|
Cloud
|
Data warehouse
|
Relational
|
Fraud detection
|
NoSQL
|
High-performance applications
|
OOD
|
Internet of Things (edge computing)
|
Relational
|
Management flexibility
|
Cloud
|
Online transaction systems
|
Relational
|
Organizational structures
|
Network
|
Personalized database experience
|
NoSQL
|
Real-time systems
|
NoSQL
|
Scalability
|
Cloud
|
Scientific data and calculations
|
OOD
|
Social networks
|
Network
|
Supply chain networks
|
Network
|
Real-time connectivity and data virtualization with CData
This article introduces some of the most popular types of databases available today. As an organization’s needs change over time, the types of data it generates may also change, requiring horizontal scaling, vertical scaling, or even additional databases. And each time you scale, you wade into unfamiliar territory. One of the easiest ways to reduce the complexity of your operation is using data virtualization.
CData Virtuality is a data virtualization platform built for evolving data requirements. It provides:
- Instant connections to sources, metadata tools, and data consumers
- A semantic data layer for complete data visibility
- Centralized access control and audit trails for robust data governance
- An integrated data shop to empower business users to retrieve datasets
- Cloud-native data integration that makes it easy to shift from on-premises to cloud hosting
- Deployment on Saas, self-hosted cloud, or on-premises
Explore CData Virtuality
Ready to experience a sophisticated data virtualization approach? Start your product t tour today.
Start the product tour