SQL - Continued

Recap - last lecture - SELECT

Basic Syntax

SELECT <attribute list> 
FROM <table list>
[WHERE <condition>];

Projection in SQL

Selection in SQL Selection (WHERE clause)

Complete WHERE condition clauses

Operand Example
=, <> ID <> 324, price = 42.30
<, > ID > 324, price < 42.30
LIKE, NOT LIKE name LIKE '%Mary%' will select all the tuples substrings containing the name 'Mary', such as 'Mary-Ann' or 'Maryam' e.g.
IN name IN ('Mary', 'John', 'Steve') acts as a ternary OR operation
BETWEEN salary BETWEEN 10000 AND 30000
IS name IS NULL acts as a ternary OR operation

GROUP BY

Aggregation and grouping

Aggregates are functions that produce summary values from a set of tuples.

Aggregate fn Description
COUNT Counts the number of tuples that the query returns
SUM/AVG Calculates the sum/average of a set of numeric values
MAX/MIN Returns the maximum/minimum value from a set of values which have a total ordering. Note that the domain of values can be non-numeric

A couple examples:

-- Select the average salaries and combined employee salaries
SELECT  AVG(salary), SUM(salary)
FROM  Employee;

-- the number of employees in department number 5
SELECT  COUNT(*)
FROM  Employee
WHERE  dNum = 5;

Aggregation and Grouping

aggregation functions are often needed to be applied to groups of rows within a table

Solution: use GROUP BY

SELECT [DISTINCT] <target list>
FROM <table list> 
[WHERE search condition]
[GROUP BY <grouping attributes> ]
[HAVING <group conditions> ]
[ORDER BY column [ASC|DESC] {, column [ASC|DESC]}];

Important!

When GROUP BY is used in an SQL statement, any attribute which appears in the SELECT clause must also appear in the GROUP BY clause or be in an aggregation function.

alt text

-- Find the average capacity of each container type
SELECT  ContainerType, AVG(Capacity)
FROM  CONTAINERS
GROUP BY ContainerType;

Think of the GROUP BY clause as a way to organise your data into groups. When you use GROUP BY, SQL wants to make sure that each item (or attribute) you select in your SELECT clause either:

  1. Belongs to the same category (group) as specified in the GROUP BY clause
  2. You're doing something special with it, like counting how many items are in each category (e.g. COUNT). If an item doesn't fit into one of these categories, SQL gets confused because it doesn't know how to group or aggregate that item, and it asks you to either specify the category (by including it in the GROUP BY clause) or explain what you want to do with it (by using an aggregation function). Else, you have to omit it from the query
-- Find the number of employees with a salary > 40,000 in each department
SELECT  dNum, COUNT(*)
FROM  Employee
WHERE  salary > 40000
GROUP BY  dNum;

HAVING

Conditions on Groups

-- The total number of employees in departments with more than 2 employees
SELECT  dNum, COUNT(*)
FROM  Employee
GROUP BY   dNum
HAVING  COUNT(*) > 2;