10 Common SQL Programming Mistakes and How to Avoid Them

Image for 10 Common SQL Programming Mistakes and How to Avoid Them
SQL (structured query language) is a simple language, but it’s not so simple when you’re working with data sets that contain millions of records. When you work with mid- to large-size tables, writing top-performing SQL statements is a priority. Before you sit down to write your SQL statements for any platform—Oracle, SQL Server, MySQL, or any others—here are the top 10 common SQL programming mistakes to avoid.

1. Forgotten Primary Keys

Every table requires a primary key for performance. If you don’t have a primary key in any table, your tables don’t follow standard SQL requirements and performance suffers. Primary keys are automatically set as clustered indexes, which speeds up queries. They are always unique, so you can use an auto-incremented numeric value if you have no other column in the table that meets the unique requirement.

Primary keys are your first step to relational databases. They link to foreign keys in relational tables. For instance, if you have a table that has a list of customers, the “CustomerId” column would be unique to every customer. This could be your primary key column. Your CustomerId value would then be placed in the Orders table to link the two tables together. Always use a primary key in every table you create regardless of its size.

2. Poorly Managed Data Redundancy

Data redundancy is good for backups but not for table data. Each table should contain a unique data set that doesn’t repeat data in other table locations. This is one of the more difficult ideas for a new SQL developer to follow. It’s very easy to forget normalization rules and repeat data across multiple tables for conveniences, but it’s not necessary and represents poor table design.

For instance, suppose you have a customer table that contains the customer’s address. Since the address relates to the customer, it’s in the proper location. You then create an “Order” table and add the customer’s address in the Order table. This type of data redundancy is poorly designed. The Customer and Order table can link together using relationships across primary and foreign keys. What happens when you forget to update the customer’s address in the Order table? The result is that you now have two addresses for the customer and you don’t know which one is accurate.

Bottom line: Always keep data in one location and use relationships between primary and foreign keys to query the data.

3. Avoid NOT IN or IN and Use JOIN Instead

NOT IN and IN statements are not well optimized. They are convenient, but usually they can be replaced by a simple JOIN statement. Take a look at a sample query.

SELECT *FROM Customer
WHERE NOT IN (SELECT CustomerId FROM Order)

In the statement above, the data set returned is all customers that don’t have an order. In this statement, the SQL database grabs all orders from the Order table and then filters out the record set based on the main outer query on the Customer table. If you have millions of orders, this is an extremely slow query.

The alternative, better performing option is the following.

SELECT *FROM Customer c
LEFT JOIN Order o on c.CustomerId = o.CustomerId
WHERE o.CustomerId IS NULL

This LEFT JOIN statement returns the same data set as the previous statement, but it’s much more optimized. It joins two tables on a primary and foreign key, which improves the speed of the query and avoids the NOT IN and IN clauses.

4. Forgotten NULL vs. Empty String Values

The NULL vs. empty string debate between database administrators has been ongoing for decades. You can choose to use NULL values when no value is present or you can use actual literal values such as zero-length strings or 0 integer values. What you use in the database should be uniform across all tables or queries can get messy. Remember that NULL values aren’t the same as, for instance, a zero-length string, so your queries must account for these values if there is no standard in your table design.

When you determine what you want to use, make sure your queries account for these values. For instance, if you allow NULLs for a user’s last name, you must query using the NULL filter (either NOT NULL or IS NULL) in your clauses to include or exclude these records.

5. The Asterisk Character in SELECT Statements

Always define the columns you want to return in your queries. This standard is for performance and security reasons. Take the following query, for example.

SELECT * FROM Customer

The query returns all customer values including any sensitive data you might keep in the table. What if you keep the customer’s password or social security number in the table? Hopefully, these values are encrypted, but even having the hashed value can help hackers. It’s also a performance issue if you have dozens of columns in the table.

Instead of the above query, always define each column. The following statement is an example.

SELECT CustomerId, FirstName, LastName FROM Customer

The above statement defines each column and limits the size of the record set as well as what a hacker can see should there be a data breach.

6. Looping with Too Many Cursors

Cursors, the looping structures in the SQL language, are the bane of database performance. They allow you to loop through millions of records and run statements against each one individually. While this may sound like an advantage, it can destroy database performance. Loops are common in programming languages, but they’re inefficient in SQL programming. Most database admins reject SQL procedures with cursors implemented.

It’s best to write the procedure in another way to avoid adversely affecting database performance if possible. Most cursors can be replaced by a well-written SQL statement. If you can’t avoid it, then cursors should be kept to scheduled jobs that run during off-peak hours. Cursors are used in reporting queries and data transformation jobs, so they can’t always be avoided. Just limit them as much as possible in production databases that perform daily queries against your database.

great job post

7. Data Mismatches in Field Assignment Procedures

When you declare your table columns, you must give each column a data type. Make sure this data type covers all necessary values you need to store. Once you define a data type, you can only store that type of value in the column.

For instance, you probably need decimal precision by 2-3 points in a column that stores a total order cost. If you assign this column as an integer, your database will only be able to store whole numbers with no decimal values. What happens with the decimals depends on your database platform. It could automatically truncate the values or give you an error. Either alternative could create a major bug in your application. Always account for what you need to store when you design your tables.

This includes writing queries—when you write your queries and pass parameter values to a stored procedure, the variable must be declared with the right data type. Variables that don’t represent the column data type will also throw errors or truncate data in the process.

8. Logical OR and AND Operations

When writing your queries, it’s easy to miss simple logic order. AND and OR statements can greatly change a data set. You can avoid common SQL bugs by either using parenthesis or organizing your statements to represent the logic that should be applied.

Let’s take a look at a SQL statement that mixes the AND and OR operators.

SELECT CustomerId
FROM Customer
WHERE FirstName = 'John' AND LastName = 'Smith' OR CustomerId > 0

The goal for the statement above is to get any customers with a first and last name of “John” and “Smith” and the customer ID is greater than zero. However, since we mixed the AND statement with an OR, all records where the customer ID is greater than zero are returned. We can overcome this logic hurdle by using parenthesis. Let’s add them to the above statement.

SELECT CustomerId
FROM Customer
WHERE (FirstName = 'John' OR LastName = 'Smith') AND CustomerId > 0

We’ve changed the logic for this statement. Now, the first parenthesis set returns all records where a customer’s first name is John or the last name is Smith. With this filter, we then tell SQL to only return values where the CustomerId is greater than zero.

These types of logic statements should be well tested before releasing them to production.

9. Subqueries Must Return One Record

Subqueries aren’t an optimal way to write SQL statements, but sometimes they can’t be avoided. When you use subqueries, they must always return one record or your query will fail.

Let’s take a look at an example.

SELECT CustomerId,
(SELECT OrderId FROM Order o WHERE c.CustomerId = o.CustomerId)
FROM Customer c

In the query above, we get a list of Customer IDs from the Customer table. Notice that we get an Order ID from the Order table where the customer ID matches. If there is only one order, this query runs fine. However, if there is more than one order for the customer, the subquery returns more than one record and the query fails. You can avoid this issue by using the “Top 1” operator.

Let’s change the query to the following.

SELECT CustomerId,
(SELECT Top 1 OrderId FROM Order o WHERE c.CustomerId = o.CustomerId ORDER BY OrderDate)
FROM Customer c

In the above query, we only pull one record and we order the records by date. This query gets the first order placed by the customer.

10. JOIN on Indexes

A table should always be well indexed, and you can take advantage of these performance enhancements by using the JOIN statement on columns assigned to an index. In the previous examples, we used the primary and foreign keys of a table. The primary key column is always an index, so you don’t need to worry about performance there. However, the foreign key should have an index on it as well.

Any JOIN statements you use should have an index set on the column. If there is no index, consider adding one to the table.

Conclusion

Relational databases are perfect for most back-end procedures, but you need to create the right SQL statement and optimize tables and queries for the best performance. Avoid these ten SQL pitfalls, and you’ll be on your way to creating a fast, efficient database for any small, medium, or large online business. Looking for a SQL programmer? Read this article to help you write a stellar SQL job description to attract the best pro for your project.

Upwork is a freelancing website where businesses of all sizes can find talented independent professionals across multiple disciplines and categories. Ready to let freelance experts help you get more done? Start today!

Jennifer Marsh

by

Jennifer is a software engineer who writes high-level technical content for companies such as Adobe, Rackspace and IBM. Her expertise is with technical writing, also covering topics such… more