DEV Community

Keith Kinuthia
Keith Kinuthia

Posted on

SQL 101: Introduction to SQL

Before understanding what SQL is, we must first understand the concept of data and data storage.

In today's world, data is everywhere—from the emails we send to the transactions we make, everything generates data. Data is simply a collection of facts or information, and it can take many forms: text, numbers, images, videos, or even signals.

To make sense of this vast amount of information, data needs to be organized and stored in a structured way. This is where databases come in.

Understanding Databases

A database is a collection of data that is stored in a form that allows for easy storage, retrieval, and management.

There exist two forms of databases: Relational databases and Non-Relational Databases.

  • Relational Databases are also called SQL databases. These are databases that store data in structured, predefined tables with rows and columns.
  • Non-Relational Databases are also called No-SQL databases. These are databases that store data without a predefined schema. These databases are designed to handle unstructured, semi-structured, or rapidly changing data.

To manage data in a database, we use database management software.

This is where we can begin understanding what SQL is.


SQL stands for Structured Query Language. It is the standard language used to communicate with relational databases. It allows users to perform operations such as querying, updating, and managing data stored in tables contained in databases.

A key thing to note is that SQL is a language, and like any language, it has various dialects. There exists a base language (ANSI standard) and then different flavors of SQL are determined by the Relational Database Management Software you use (RDBMS).

The most common RDBMS today are: Oracle, MySQL, Microsoft SQL Server, and PostgreSQL (in order of popularity as of June 2024).

The summary is that there exist tables that store data, and those tables are housed within databases, and those databases are managed using a relational database management software.


SQL Syntax

One of the best things about SQL is that it is a procedural language, meaning that it feels like writing human instructions. An example would be:

SELECT column_a
FROM table_a;
Enter fullscreen mode Exit fullscreen mode

Even if one didn't have technical computer knowledge, they would be able to understand what this snippet of code was written to do, which is to return column_a that exists within table_a.

Something to note is that key SQL statements are not case-sensitive, meaning the code we used above could still be correct if written as:

select column_a
from table_a;
Enter fullscreen mode Exit fullscreen mode

However, it is standard practice to capitalize key SQL statements like SELECT and FROM, not only to enhance neatness and readability, but also to ensure you can easily spot errors in your code.


Let's Get Down to Writing SQL Code!

The first statement you will encounter is SELECT. It is used to select what you specify it to select from a table.

While you can select one column like what we did above, you can also select multiple columns:

SELECT
    column_a,
    column_b,
    column_c
FROM table_a;
Enter fullscreen mode Exit fullscreen mode

Two things to note here:

  1. In one's head, one might be wondering why we indented the code. It is just to enhance neatness. The indent is not part of SQL syntax.
  2. Also, you might have noticed that the code always ends with a semicolon. This is because this is how the RDBMS knows where to terminate the 'procedure' or SQL code.

FROM is used to specify which table the data should come from.

Aliases - Let's say you want some columns to have a specific name when the results are returned after the query is run. This is where you would use aliases. The key word for alias is AS:

SELECT
    column_a AS a,
    column_b AS b,
    column_c AS c
FROM table_a;
Enter fullscreen mode Exit fullscreen mode

Filtering Results

Imagine you want to filter your results. This is where you would use the magic SQL keywords responsible for filtering: WHERE, LIKE, HAVING, and REGEXP.

We shall only cover WHERE, though I will give you some insights on the rest.

  • LIKE and REGEXP are related. REGEXP is a more advanced version of LIKE used to filter data using specific criteria, for example, giving records where values in a certain column end with 'th'.
  • HAVING, on the other hand, is another version of WHERE, but with more specific criteria.HAVING for example can take GROUP BY queries and filter them while WHERE cannot
SELECT
    column_a AS a,
    column_b AS b,
    column_c AS c
FROM table_a
WHERE column_a < 10;
Enter fullscreen mode Exit fullscreen mode
SELECT
    CustomerID,
    COUNT(SaleID) AS SalesCount
FROM Sales
GROUP BY CustomerID
HAVING SalesCount > 5;
Enter fullscreen mode Exit fullscreen mode

If you read the SQL code like human instructions, you get an idea of what it is trying to do:

We want to select column_a, column_b, and column_c, each with their own alias, and we are selecting from table_a. From those records we have selected, we want only those where the values in column_a are less than 10.


Creating Tables and Databases

To create anything, we use the keyword CREATE.

We can create tables and databases.

The syntax for this is:

CREATE database_name;
CREATE table_name;
Enter fullscreen mode Exit fullscreen mode

Whenever we create a table, we have to think about what data is going into the table and what data types each field will take on.

There are various data types, but we shall first divide them into number data types, character data types, and datetime data types.

  • Number data types include INT, AUTO_INCREMENT, FLOAT. The key thing to note is that INT represents integers. While we have subdivisions of INT, which include BIGINT and SMALLINT, they only represent the range of values that can be taken but not the integer aspect. AUTO_INCREMENT is an integer data type that increases every time a record is added to a table. It is at times used as a surrogate primary key in tables.

  • Character data types include CHAR, VARCHAR, and TEXT. CHAR and VARCHAR have the same characteristics, but there is one big difference. CHAR takes on the full specification of characters given, but VARCHAR does not.

    For example, if I specify a field to take on VARCHAR(10) and another to take CHAR(10), meaning each to take 10 characters, and then I input the string 'hello' in both fields, when I do a count of the characters in the VARCHAR, it returns 5, but for the CHAR, it returns 10. Basically, CHAR will add white space to the remaining spaces to fill the limit given, which is 10 characters. That is also why VARCHAR is named as it is, which stands for Variable Character.

  • Datetime data types include DATE, TIME, and DATETIME. You may ask why I haven't added the rest like INTERVAL and YEAR. This is because they are not part of the ANSI standard of SQL and vary across RDBMS. DATE represents dates, TIME represents time, and DATETIME is a combination of DATE and TIME.

Here is a snippet of code putting all these things together:

CREATE TABLE teachers (
    id BIGINT AUTO_INCREMENT PRIMARY KEY, -- This is autoincremental data type
    first_name VARCHAR(25),
    last_name VARCHAR(50),
    school CHAR(50),
    hire_date DATETIME,
    salary FLOAT
);

INSERT INTO teachers (first_name, last_name, school, hire_date, salary)
VALUES ('Ken', 'Hubert', "Murang'a", '2020-01-01 15:49:20', 65000);
Enter fullscreen mode Exit fullscreen mode

The LIMIT Clause

This is one of the most important clauses when querying any table. Imagine this: You work at Instagram and have access to their SQL databases (assuming they have those). Imagine querying a table of all Instagram users and you want to filter all the records where the age of the user is below 20. Logically, that could be in the tens of millions, if not hundreds. Imagine hitting run and just crashing the system because you've requested more than the system can handle. This is how the LIMIT clause could help you. It helps retrieve a subset of the records from that query, hence avoiding disaster!

SELECT
    name,
    age
FROM ig_users
WHERE age < 20
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Here's your article with the requested formatting in markdown:


Joins in Relational Databases

One of the main aims of SQL databases is to reduce redundancy when storing data in tables. This will involve storing data in multiple tables that have relationships with each other. When querying the tables to get data, we might need to join tables to get the results we want.

There are a few types of JOINS:

  • JOIN or INNER JOIN brings columns that are matching in both tables.
  • LEFT JOIN brings every row in the LEFT Table and it finds matching rows in the RIGHT table.
  • RIGHT JOIN brings every row in the RIGHT Table and it finds matching rows in the LEFT table. Remember, a right join is just a left join with the tables inverted.
  • FULL OUTER JOIN
  • CROSS JOIN - Returns every possible combination of rows from both tables.

An example that will show the differences between these joins can be learnt by implementing the code snippet below:

CREATE TABLE schools_left (
    id INT,
    left_school VARCHAR(30),
    PRIMARY KEY (id)
);

CREATE TABLE schools_right (
    id INT,
    right_school VARCHAR(30),
    PRIMARY KEY (id)
);

INSERT INTO schools_left (id, left_school) 
VALUES
    (1, 'Oak Street School'),
    (2, 'Roosevelt High School'),
    (5, 'Washington Middle School'),
    (6, 'Jefferson High School');

INSERT INTO schools_right (id, right_school)
VALUES
    (1, 'Oak Street School'),
    (2, 'Roosevelt High School'),
    (3, 'Morrison Elementary'),
    (4, 'Chase Magnet Academy'),
    (6, 'Jefferson High School');
Enter fullscreen mode Exit fullscreen mode

INNER JOIN

SELECT *
FROM schools_left sl
JOIN schools_right sr 
    ON sl.id = sr.id;
Enter fullscreen mode Exit fullscreen mode

LEFT AND RIGHT JOIN

SELECT *
FROM schools_left sl
LEFT JOIN schools_right sr
    ON sl.id = sr.id;

SELECT *
FROM schools_left sl
RIGHT JOIN schools_right sr
    ON sl.id = sr.id;
Enter fullscreen mode Exit fullscreen mode

You’d use either of these join types in a few circumstances:

  • You want your query results to contain all the rows from one of the tables.
  • You want to look for missing values in one of the tables; for example, when you’re comparing data about an entity representing two different time periods.
  • When you know some rows in a joined table won’t have matching values.

FULL OUTER JOIN

SELECT *
FROM schools_left sl
FULL OUTER JOIN schools_right sr
    ON sl.id = sr.id;  -- FULL OUTER JOINS AREN'T PART OF MySQL But rather found in PostgreSQL
Enter fullscreen mode Exit fullscreen mode

CROSS JOIN

SELECT *
FROM schools_left sl
CROSS JOIN schools_right sr;
Enter fullscreen mode Exit fullscreen mode

A good way to find missing records is using the NULL operator combining the JOINS.

An example is finding columns that didn't have a match after a LEFT JOIN:

SELECT *
FROM schools_left sl 
LEFT JOIN schools_right sr 
    ON sl.id = sr.id
WHERE sr.right_school IS NULL;
Enter fullscreen mode Exit fullscreen mode

With this introduction, one should be able to play around with creating databases, tables, inputting data into them, and querying the data as a whole and also filtering the queried results. You should also be able to experiment with various joins and practice how to retrieve data stored in different tables.

Also, while playing around, you might encounter runtime errors which are very useful in guiding you to understand what not to do while writing SQL queries and also give you a better understanding of the language itself.

Thank you for reading!

Top comments (0)