What is SQL
SQL (Structured Query Language) is a programming language used to manage and manipulate relational databases. It is used to insert, update, and retrieve data from databases, as well as to create, modify, and manage database structures. In simple terms, SQL is a way to talk to databases and get information from them or put information into them.
In Data science, we used SQL to read data and manipulate it to get our desired result.
Example
Let's say you have a database of all the students in your school, and each student has a name, an age, and a grade level. You could use SQL to ask the database for a list of all the names of the students in your grade. Or you could use SQL to change a student's grade level if they got promoted.
Why we should use SQL?
SQL is the most common and easy method to access data in databases. In Data science, we will use SQL to read data and manipulate it to get our desired result. Here we will be focused on manipulating data rather than the creation and removal of data.
Some Key Features of SQL
- Easy to understanding- SQL is a simple and intuitive language to learn and use.
- Direct data access- Traditional databases allow users to easily access and retrieve specific data.
- Data audit and replication: It is easy to audit and replicate data in traditional databases.
- Multi-table analysis- SQL is powerful for analyzing data from multiple tables at once.
- Complex analysis- SQL allows users to analyze more complex data and questions than dashboard tools like Google Analytics.
SQL Queries
To understand and execute queries first we need to configure Parch and Posy Database
Database Schema:
1. SELECT & From
This statement is used to query a database and retrieve specific data from one or more tables. It is one of the most commonly used SQL commands. The basic syntax of this statement is as follows:
Query
SELECT column1, column2 From table_name;
In this query, the SELECT keyword is used to specify that you want to retrieve data from the database. The column1, column2, ... are the names of the columns that you want to retrieve data from. You can also use the * wildcard to select all columns. The FROM keyword is used to specify the table that you want to retrieve data from.
Query
SELECT * FROM orders;
This statement will retrieve all the columns in the order table as shown in figure below.
2. LIMIT
The LIMIT statement allows you to retrieve only a specific number of rows from a table, which can be useful when you only need to see the initial data and don't need to load the entire dataset. This can be much quicker for loading, as it reduces the amount of data that needs to be loaded.
LIMIT Statement will always used at the last of query.
Query
SELECT * FROM orders LIMIT 10;
This command will limit first 10 rows of all the columns of table.
3. ORDER BY
The ORDER BY statement in SQL allows sorting of query results based on data in any column. However, the sorting effect is only temporary and specific to that query, unlike sorting in spreadsheet software which permanently alters the data. This difference highlights the purpose of a SQL query.
DESC
can be used after the column in your ORDER BY statement to sort column in descending order, as the default ORDER BY query will sort in ascending order.
Query
SELECT id, sales_rep_id
FROM accounts
ORDER BY id
LIMIT 10;
In this statement the columns in the table will be sorted in ascending order with respect to id column.
We can even sort by multiple columns using ORDER BY
by providing a list of columns. The sorting process first uses the leftmost column in the list, then the next column and so on. Additionally, it is still possible to reverse the order using the DESC
keyword.
Query
SELECT account_id, total_amt_usd
FROM orders
ORDER By account_id, total_amt_usd DESC;
Here account _id
will be sorted in ascending order as usual and total_amt_usd
will be sorted in descending order as shown in figure below.
4. WHERE
The WHERE statement in SQL is used to filter the results of a query. It allows you to specify certain conditions that the data in your query must meet in order to be included in the final output.
Common symbols used in WHERE statements include:
>
(greater than)
<
(less than)
>=
(greater than or equal to)
<=
(less than or equal to)
=
(equal to)
!=
(not equal to)
Query
SELECT *
FROM orders
WHERE account_id = 4251
ORDER BY occurred_at
LIMIT 1000;
In this statement, we are extracting the information, where account_id is equal to 4251.
WHERE
statement will work with non-numeric data as well.
Query
SELECT *
FROM accounts
WHERE name = 'Walmart;'
5. Arithmetic Operators
In SQL, arithmetic operators are used to perform mathematical operations on values in a query. The most common arithmetic operators in SQL are:
*
(Multiplication)
+
(Addition)
-
(Subtraction)
/
(Division)Order of all the arithmetic operation will follow PADMAS rule
Query
SELECT id, (standard_amt_usd/total_amt_usd)*100
FROM orders
LIMIT 10;
In the output, you will notice that it created a new column after multiplication and the name of column is unknown. This can be solved using derived column.
Derived Column:
A derived column, also known as a calculated or computed column, is a new column created by combining existing columns in a table. This new column can be given a name, known as an alias, using the AS
keyword.
Query
SELECT id, (standard_amt_usd/total_amt_usd)*100 AS std_percent
FROM orders
LIMIT 10;
Here, we are dividing the dollar amount of standard paper by the total order amount to calculate the percentage of standard paper used in the order. We named this new column "std_percent" using the AS
keyword.
6. Logical Operators
In SQL, logical operators are used to combine multiple conditions in a query. Some of the commonly used logical operators are:
(i) Like
The LIKE operator in SQL is used to match a specific pattern in a column, it allows you to perform operations similar to using WHERE and =, but when you don't know the exact value.
It's particularly useful for working with text data. The LIKE operator is frequently used with %
which means search for the text that is in between %
no matter what comes on either side of the text.
It's case-sensitive, so searching for 'T' is different from searching for 't'
Query:
SELECT *
FROM accounts
WHERE website LIKE '%google%';
This query will select row that contains the google in it, no matter what comes before or after the google.
(ii) IN
The IN operator is useful for working with multiple values in both numeric and text columns, it allows you to check for one, two or many values within the same query. It's similar to using = but for multiple values of a particular column. It's a cleaner way of writing queries compared to using the OR operator which also allows you to perform similar tasks.
Query:
SELECT *
FROM orders
WHERE account_id IN (1001,4251);
It will extract the row from 1001 to 4251.
(iii) NOT
The NOT operator in SQL is used to negate a condition, it can be used in combination with other operators like IN and LIKE to retrieve rows that do not match specific criteria.
For example, the NOT IN operator can be used to retrieve rows that do not have a specific value in a column and the NOT LIKE operator can be used to retrieve rows that do not match a specific pattern in a column. It's useful for filtering and retrieving data that does not meet specific conditions.
Query:
SELECT *
FROM orders
WHERE account_id NOT IN (1001,4251);
In this query, all the row that not from 1001 and 4251 will be retrieved.
(iv) AND
In SQL, the AND operator is used to combine multiple conditions in a query. The AND operator is used to match rows where all conditions are true.
Query:
SELECT *
FROM orders
WHERE account_id NOT IN (1001,4251) AND id IN(17,26);
In this query, two conditions will be checked if both of them are true then it will retrieve the information as shown in figure.
(iv) BETWEEN
In SQL, the BETWEEN operator is used to match a range of values within a column. The BETWEEN operator is used to match rows where a column value is between two specified values, inclusive of the specified values.
Query:
SELECT *
FROM orders
WHERE account_id BETWEEN 1001 AND 4251
In this query, all the row between 1001 and 4251 in account_id will be retrieved.
(v) OR
The OR operator in SQL is used to combine multiple conditions in a query, it matches rows where at least one condition in query is true.
Query:
SELECT standard_qty, gloss_qty
FROM orders
WHERE (standard_qty = 0 OR gloss_qty = 0);
In this query, it will select all the rows where either gloss quantity is zero or standard quantity is zero.
These operator can be used in combination with other operators such as arithmetic operators (+, *, -, /)
Best Practice For Formatting Query
Following are the best practices that one should follow while writing the SQL queries:
1. Capitalize SQL commands like SELECT and FROM, and keep everything else in lower case to make the query more readable.
SELECT account_id
FROM orders;
2. Use underscores instead of spaces in column and table names.
SELECT account_id, standard_qty
FROM orders;
3. Always include a semicolon at the end of each statement, as it may be required in some SQL environments.
SELECT account_id
FROM orders;
4. Use double quotes or square brackets to reference tables and columns that have spaces in their names.
SELECT "full name", "age"
FROM "employee information";
5. Use comments in your query for better understanding of your code.
-- This query selects all information from 'customers' table
SELECT * FROM customers;
6. Be consistent in formatting throughout your queries and scripts.
7. Use white space in queries to make them more readable.
8. Use indentation to make the query structure more clear.
9. Use meaningful name for columns, tables and variables in order to make it more understandable and readable.
10. Avoid using too many subqueries and joins, use them only when it's necessary.
Top comments (0)