SQL Approach to Perform Data Analysis and Data Science Part-2

Vervit Khandelwal
The Startup
Published in
6 min readJan 23, 2021

--

Basics of SQL is mentioned in Part-1 of SQL Approach.

In this article we will be covering various types of Joins and Subqueries

We will be working with Hr schema to demonstrate examples.

Multiple Table Queries

JOINS Clause is used to join two or more table, bases on a related column between different tables.

Types of Joins

  1. Simple Joins
  2. Natural Joins
  3. Equi Joins
  4. Non Equi Join
  5. Self Join
  6. Left & Right Join
  7. Inner & Outer Join

Simple Join

SELECT t1. column_n, t2.columns_n ,….

FROM table_1 as t1

JOIN table_2 as t2

ON t1.column_n = t2.column_n;

Natural Join

Natural Join maps the rows implicitly among common columns in both the tables

  • ON clause is not used while using Natural join
  • Natural join usually takes all the keys column of the first table and then tries to match with other table columns.

SELECT * FROM table_1 NATURAL JOIN table_2;

Restriction of Natural Join

  • All the columns should have unique data without duplicates.
  • High chances to retrieve too many rows without meaningful relationship.
  • Cannot handle NULL values in Joining key.

Queries with more than two Tables

Natural Join / simple Join can be used to join multiple Tables as ones.

SELECT * FROM table_1 NATURAL JOIN table_2 NATURAL JOIN table_3;

Inner Joins / Equi Joins

  • Inner joins efficiently handles null values and comparions.
  • Assign default values by replacing null values in joinging key columns and ensures rows are not droped out.

SELECT * FROM table_1 JOIN table_2 USING (column_n);

Non Equi- Joins

Non Equi joins uses comparison operators (> ,< , NOT , <>) in order to filter the records in one table and map the remaining rows across the other table rows.

SELECT * FROM table1 JOIN table_2 WHERE column_n > [int.]

Join with Multiple Queries

Self Join

Self Join means joining the same table to itself ,used when we see meaning full data in same table.

SELECT * FROM table_1 a JOIN table_1 b ON a.column_n = b.column_n AND condition;

Inner Join

The INNER JOIN selects records that have matching values in both tables.

SELECT column_name(s)FROM table_1 t1 INNER JOIN table_2 t2
ON t1.column_n = t2.column_n;

Highlighted section represents Inner Join

Left Join

  • Left join selects all records from left table (table_1)and matching records which are in right table (table_2).
  • If no match from right table (table_2) the result is NULL.

SELECT column_n FROM table_1 t1 LEFT JOIN table2 t2
ON t1.column_n = t2.column_n;

Highlighted section represents Left Join

Right Join

  • Right join selects all records from Right table (table_2)and matching records which are in left table (table_1).
  • If no match from Left table (table_1) the result is NULL.

SELECT column_n FROM table_1 t1 RIGHT JOIN table_2 t2
ON t1.column_n = t2.column_n;

Highlighted section represents Right Join

Full Outer Join

Full outer Joins helps to retrieve combination of LEFT and RIGHT join results.

  • Retrieve result set of ALL from LEFT JOIN
  • Retrieve result set of ALL from RIGHT JOIN
  • Records can have NULL values

The purpose of FULL Join is to get a glance on all accounts and transaction

SELECT column_n FROM table_1 t1 LEFT JOIN table2 t2
ON t1.column_n = t2.column_n;

UNION

SELECT column_n FROM table_1 t1 RIGHT JOIN table_2 t2
ON t1.column_n = t2.column_n;

Subquery

  • A select query when it is nested in another main query, then it is called as a subquery
  • Like joins, subqueries need common key columns for joining with main queries
  • Subqueries are otherwise called as virtual table enclosed with an independent business logic
  • Subqueries execute independently and share its results with the main query, so that the complexity reduces while writing the queries
  • Subqueries can be written in WHERE clause of another query by using multi-row operators including EXISTS, IN , ANY and ALL.
  • They can also use single-row comparison operators including <, >, =

IN operator is well used when the main query searches all of the multiple rows returned by subquery

Subquery in WHERE clause

SELECT a.column_n FROM table_1 as a WHERE a.column_n IN

(SELECT b.column_n FROM table_2 as b WHERE b.column_n = ‘ x’);

  • Retrieve employee_id , first_name and salary details of those employees whose salary is greater than the average salary of all the employees ?

Subquery with Outer Reference

SELECT a.column_n FROM table_1 as a WHERE a.column_n IN

(SELECT b.column_n FROM table_2 as b WHERE b.column_n = ‘ x’)

AND condition;

  • Find first name, department ID where first name starts from ‘M’ and location id is 1700 ?

Subqueries with ‘WITH’ Clause

  • The WITH clause, or subquery factoring clause.
  • WITH clause may be exercised as an inline view or committed as a temporary table.
  • The benefit of the latter is that repeated instance of the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being needed by each instance.
  • WITH clause plays a major role in complex queries when there is a need for calling the subqueries for multiple times.

WITH column_n as

(SELECT column_n FROM table_1),

SELECT column_n FROM table_1 WHERE condition;

Subqueries Comparison Test

  • Sub Queries’ results are dynamic rather than constant
  • That means, developer will not manually pass any input in order to fetch the records, the dynamic input is retrieved from subquery
  • These subqueries when used in WHERE clause, uses its dynamic input in the condition and then fetches the records

SELECT * FROM table_1 WHERE column_n =

(SELECT column_n FROM table_2 WHERE condition);

Subquery Existence Test

  • The subquery is study by each record of the main query using common key columns.
  • The common key column exist in main query and subquery.
  • The subquery returns true (1) or false (0) when its conditions are satisfied with main query input column values.

SELECT * FROM table_1 WHERE EXISTS

(SELECT column_n FROM table_2 WHERE condition);

Scalar Values Expression using Subquery

  • Scalar value expression is the subquery enclosed in anotherSELECT query as a derived value
  • Scalar Query is executed for each record returned by main query

SELECT column_n,…..,

CASE WHEN column_n = (SELECT column_n FROM table_1)

THEN ’text’ ELSE ’text’ FROM table_2 WHERE condition;

I am really glad we made it so far. In this article, I made use of MySQL workbench 8.0 to establish the examples.

--

--

Vervit Khandelwal
The Startup

Analyst Quantitative Research | Data Scientist |