Group and Aggregation

Section 6: Grouping and Aggregation

In this section, we'll explore the powerful concepts of grouping and aggregation in SQL. These functionalities allow us to summarize and analyze data at a higher level, making it easier to derive insights from large datasets.

Grouping Data with GROUP BY

The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, like creating a summary or report. We often use aggregate functions in conjunction with GROUP BY to perform calculations on each group of rows.

Grouping results based on one or more columns

For example, grouping employees by their department:

SELECT department_id, COUNT(*) AS employee_count

FROM employees

GROUP BY department_id;

This query counts the number of employees in each department.

Using aggregate functions with GROUP BY

You can use aggregate functions like SUM, AVG, MAX, etc., to perform calculations on each group. For instance, finding the total salary expense per department:

SELECT department_id, SUM(salary) AS total_salary

FROM employees

GROUP BY department_id;

This query calculates the total salary for each department.


Filtering Grouped Data with HAVING

The HAVING clause is used in conjunction with GROUP BY to filter the results of aggregate functions. It allows you to specify conditions that must be met by the groups included in the result set.

Applying conditions to grouped data using the HAVING clause

For instance, finding departments with an average salary greater than 50000:

SELECT department_id, AVG(salary) AS average_salary

FROM employees

GROUP BY department_id

HAVING AVG(salary) > 50000;

This query filters out departments with an average salary less than or equal to 50000.


Combining GROUP BY and HAVING for complex queries

You can combine GROUP BY and HAVING for more complex queries. For example, finding departments where the average salary is above 50000 and the total number of employees is greater than 5:

SELECT department_id, AVG(salary) AS average_salary, COUNT(*) AS employee_count

FROM employees

GROUP BY department_id

HAVING AVG(salary) > 50000 AND COUNT(*) > 5;

This query provides a more detailed analysis, filtering departments based on both average salary and the number of employees.

Understanding GROUP BY and HAVING is crucial for performing data analysis in SQL. These clauses empower you to group data based on certain criteria and apply aggregate functions, allowing for insightful summaries and analyses of your datasets.