Databases store, organize, and process information in a way that makes it easy for us to go back and find what we’re looking for. We encounter databases, both simple and complex, all the time, whether in the form of library card catalogs, financial records, employee directories, and even phone books.
But, what are databases in the context of a website? In this quick guide to modern database technology, you’ll get an understanding of how databases work, common terms to know, a look at SQL vs. NoSQL, and how to determine which database is best for your web application.
A quick overview of modern database technology
Spreadsheets process numbers, databases process information—specifically, structured information. Databases can be designed to do just about anything with information—track, organize, and edit data; collect data and produce reports; or, be the foundation for information-rich, dynamic websites.
Increasing complexity: Single-file vs. Multi-file databases
Take the phone book, for example: It’s got items of information like names, addresses, and phone numbers, all organized in the same format. In database terms, the book is the table, each person is a record, and their name, address, and phone number are all fields. The last name—how the book is organized, alphabetically—is the key field, which sorts the records.
Because the last name was chosen as the key field, that’s how the phone book is sorted—sorting by phone number or house number wouldn’t generate what we were looking for, and we’d never find anything out of the thousands of entries. And that’s one cornerstone of database technology—smart sorting.
But a phone book is just a flat, single-file database. When would you need more complex databases with multiple tables that can interact with one another? Say you want a shipment status update on an order you placed with an e-commerce site. That website has multi-file databases set up for orders, dates, payments, shipment tracking, inventory, suppliers, and customers. By linking these tables together, if a query is made about an order’s status, the database can generate a report with data from the tables:
“[Customer]’s order of [product] purchased with a [payment method] on [purchase date] is being shipped to [address] via [method], due to arrive [tracking date].”
That, in a nutshell, is a relational database.
Relational Databases and Database Management Systems: Building Powerful Data-Driven Websites
The foundation for modern database technology began in the 1970s with the first “relational data model.” Its emphasis was on careful organization. Today, relational databases remain important to how websites are built: any website that displays data from a database has to have (a) server-side scripting, (b) HTML & CSS, (c) SQL, a database language, and (d) a database management system (DBMS).
Relational databases consist of two or more tables with connected information, each with columns and rows. These connected tables are called database objects, and in order to create them and manage them, you need a relational database management system (RDBMS). One example of an RDBMS is Microsoft's SQL Server. RDBMSs allow relational database developers to create and maintain a database program, including tools to:
- Query data
- Sort and edit data
- Design the entire database structure
- Produce reports
- Validate data points and check for inconsistencies
- And, they often include a built-in programming language to automate some of these functions, such as SQL.
SQL: The Language of Database Access
Structured Query Language (SQL) is a standardized programming language for accessing and manipulating databases. In an RDBMS like MySQL, Sybase, Oracle, or IMB DM2, SQL writes programming that can manage data and stream data processing. SQL is like a database’s own version of a server-side script and is responsible for:
- Executing queries, which are “questions” asked of the database
- Retrieving data
- Editing data: inserting, updating, deleting, or creating new records
- Creating views
- Setting permissions
- Creating new databases
SQL is a standard programming language, but has a number of variations—including some databases’ own proprietary SQL extensions.
Note: When it comes to creating new databases from the ground up, planning ahead is key. In the same way you need to plan ahead for the future of your site when choosing a software stack, how your database is structured from day one will have major implications for the health of your site down the road. Questions to consider include: What information will you have? How should it be stored? What data will your site need to retrieve regularly, and how?
NoSQL database: Non-relational and distributed data
Relational databases are great at organizing and retrieving structured data, but what happens when your data is inconsistent, incomplete, or massive?
In these cases, you need a more flexible database solution. As the kinds and amounts of data that we gather has exploded, the NoSQL database has evolved to solve the challenges of Big Data. NoSQL means “Not only SQL.” These databases are non-relational and distributed. They deviate from the traditional relational model, addressing the issue that most modern data harvested from the web is not structured information. NoSQL lends flexibility, scalability, and variety—major advantages from a business standpoint, when you consider that growing data is a direct result of a growing business.
How does a NoSQL database work? Instead of tables, NoSQL databases are document-oriented. This way, non-structured data (such as articles, photos, social media data, videos, for example, an entire blog post) can be stored in a single document that can be easily found but isn’t necessarily categorized into a bunch of pre-set fields. It’s more intuitive, but note that storing data in bulk like this requires extra processing effort and more storage than the highly organized SQL data. That’s why Hadoop—an open-source computing platform—is so helpful and often integrated into database platforms.
Common database terms to know
Here are a few major database features that are helpful to know when weighing one database against another—things like how databases grow, protect against failure, duplicate data for speed, safety, and accessibility.
How much do you expect your data to grow (and how soon)? Do you need a highly scalable database that will be reliable even as the amount of data you’re processing grows exponentially? Will one server be enough, or do you anticipate needing to add additional ones? Horizontal scaling spreads data out across a distributed network of affordable commodity hardware rather than concentrating it in one massive (and expensive) server.
Related to horizontal scaling, sharding is a technique for storing massive databases across multiple servers. It achieves this by breaking splitting different rows into different tables. For instance, a database of customer names might store customers with last names A-M on one shard, while N-Z are stored on another. Sharding can help minimize response times for queries while also allowing data to be stored across a large number of cost-effective servers.
Does your app need real-time access to update and synchronize data? Replication is the process of frequent copying of data from one database onto databases on other servers. When data is replicated from an app’s primary server, it’s synchronized with secondary database servers, making that information accessible in real-time (and safe, in the event of a crash). Queries to this secondary database won’t slow the network or the performance of the app.
Latency refers to the time it takes for data to complete a “round trip” between the database server and the application server. When an app queries its database for data, this is how long it takes the server to return that data. The lower the latency, the better, but low latency often comes at a cost to other features, like consistency and availability.
When writing to a database, it’s important that changes to the data don’t violate the rules of the database. Consistency ensures that transactions don’t produce errors that can make the entire database invalid. A fully consistent system means that as soon you successfully write a record to a database you’ll also be able to request it. This is especially important for things like financial transactions. Consistency comes at a cost to speed and availability, however. That’s why many NoSQL databases opt for an eventually consistent model that allows for faster reading and writing.
Availability refers to whether or not the system is able to quickly respond to a request, even when failures occur. In databases that are spread across multiple servers, this can result in out-of-date or incorrect data being displayed, especially in an eventually consistent system. Depending on your business needs, however, slightly out-of-date data may be preferable to delays that prevent the whole system from functioning.
Failures are inevitable, but there are plenty of contingency plans you can put in place to ensure that data is still available and your app doesn’t crash. Having “no single point of failure” ensures that an app can keep functioning without interruption, usually through replication or redundancy. Databases do this differently, with varying degrees of cost and footprint.