I get asked a lot about how to learn SQL or what is the best way to learn SQL. I learned SQL in about two days from a simple book. The syntax, that is, that’s the easy bit because SQL language has been designed to be easy and human friendly to make it easier o understand and remember. It took me another few years to know how to use it effectively, understand data relations and how databases work.
What is SQL
SQL stands for Structured Query Language. It was invented in the 70s and was specifically designed to query and manipulate data in relational databases naturally. It was initially called SEQUEL and then reduced to just three letters: SQL.
For example, to read data from a database, we use the SELECT statement. To write data into the database, we use INSERT statement and to modify data in the database, use UPDATE statement. To create a new table where our data is stored, we can use (you guessed it) CREATE statement, and to change it, we use ALTER statement. To read data from more than one table at a time, we can use a JOIN statement. It’s all way too simple! We can also filter the data we are reading with the WHERE clause. For example, to get a list of all students with the name Marcin:
WHERE column_student_name = 'Marcin'
The * means “all columns”. You can think of a table as an Excel Spreadsheet. You want to see all columns in the spreadsheet, or you can be more specific:
SELECT column_student_surname, column_student_age
WHERE column_student_name = 'Marcin'
Because of its ease of use, SQL language is often used to query non-relational databases (NoSQL) too – for example, Google has developed an SQL interface for their NoSQL Spanner database because it made their developers more productive.
As a quick comparison, this is how you would run the same in a NoSQL database (using MongoDB as an example):
Even though the code takes less to write, it is less expressive and more challenging to read. If you want to explore more, MongoDB documentation has more examples.
With this out of the way, as we are not discussing SQL vs NoSQL, let’s focus on how.
What are all the different flavours of SQL
Each vendor will often have its version of SQL. Microsoft SQL Server will have T-SQL (Transact SQL), Oracle will have PL/SQL (Procedural SQL) and so on. They will all, however, also accept a standard ANSI SQL – the standard language.
Even if you know which platform you want to work with, you should always try and learn ANSI SQL as it’s platform-independent. It also offers the most portability. If you write an application using T-SQL, it can only run on SQL Server, but it can run on any database engine if you write it in ANSI SQL.
Once you establish yourself in a particular technology, you can start picking up vendor-specific flavours. Such an example would be CAST (ANSI) and CONVERT (T-SQL). Although they both do the same thing, CONVERT is newer and more flexible than CAST.
SELECT CONVERT(INT,column) FROM table;
SELECT CAST(column as INT) FROM table;
Relational Algebra and Set theory
This is the core of understanding how the logical view of the database works. In order to write a SELECT statement, we need to understand how the data is related. For example, we have to know how orders are related to customers. You may have heard about Venn diagrams used to express relations between sets in the set theory. On the other hand, relational algebra applies the set theory principles in set relations used in databases. Relational databases operate on set principles, but as scary as it sounds, this is all about joining data from all the different tables.
The above shows three sets: Customers, Orders and Customers with Orders. In a SQL language, the query would look like this:
The ON clause specifies JOIN criteria, and this is the most critical part of the relation. We have to understand how to JOIN data sets together to make sense of it. If we get that wrong, we will get incorrect results, duplicates, or missing data.
In a NoSQL database such as MongoDB joins are impossible to do because there is no relation between tables. You can still “glue” data together by doing lookups, a bit like a VLOOKUP in Excel but it is not as powerful as relations where you can do all sorts of set-based operations.
How the database engine works
All of the above is necessary to query data accurately. Understanding how the database engine works will allow you to query data more efficiently. This will be specific to the database engine you are working with, and they will all behave differently. For example, knowing what columns to select and what data types they hold is critical to reducing unnecessary IO saturation, i.e. reducing the amount of data we read from the database. Knowing what indexes are applied to what columns will allow you to improve your query performance even further.
Why you should learn SQL
SQL language is the most versatile and most popular way to interact with databases. It is not going away, and if you are thinking about a career with data, SQL is a must.
Learning SQL is all about understanding the set theory
Learn ANSI SQL syntax before you start learning vendor specific flavours
Understand the concept of set theory and relational algebra
Learn how database engines work on a physical level