While searching for the best way to write some backend methods I stumbled on an amazing, yet not so new feature of PostgreSQL . Since version 8.4 postgreSQL has supported Common Table Expressions (CTE) and recursion, which is useful for creating complex yet readable queries of hierarchical data. Straight from the postgreSQL docs: “Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query”.
An Example
the table company_members stores the relationships of a company’s organizational structure.
company_members
manager | name | job_title |
Michael | NULL | CEO |
Jan | Michael | CTO |
Creed | Michael | COO |
Dwight | Creed | Asst to the Regional Manager |
Pam | Jan | secretary |
Stanley | Dwight | sales |
The following recursive query will list the company_members hierarchically:
WITH RECURSIVE get_company_rank AS (
SELECT manager, name, job_title FROM company_members WHERE job_title = 'CEO'
UNION
SELECT cm.name, cm.manager, cm.job_title FROM company_members cm
INNER JOIN get_company_rank gcr ON gcr.name = cm.manager
)
SELECT * FROM get_company_rank;
The Query Explained
- The WITH keyword creates the CTE
- The RECURSIVE keyword allows the CTE to refer to its own output and allows you to define a base case and a recursive expression
- The base case (above UNION) returns (Michael, CEO)
- The recursive statement (below UNION) will continue to execute until no rows are returned
WARNING! : If there are any cycles in the table data you will recurse infinitely through the dataset. Below I will go into detail on adding a breakpoint to the recursive query
Adding Depth
If you do not want to select the entire table you will need to add a depth parameter to end the recursion after a certain depth is reached. The break point technique can also be used to end the query if an infinite loop has occurred by breaking at a certain number of iterations or by setting a timeout.
IMPORTANT! : The ordering of the parameters must match the ordering of the columns in the select statements. If you receive any “operator does not exist” errors that is most likely the cause.
WITH RECURSIVE get_company_rank(depth,name,manager) AS (
SELECT 1, cm.name, cm.manager,cm.job_title FROM company_members cm WHERE job_title = 'CEO'
UNION
SELECT (gcr.depth + 1), cm.name, cm.manager, cm.job_title FROM company_members cm, get_company_rank gcr
WHERE cm.manager = gcr.name AND depth < 5
)
SELECT * FROM get_company_rank;
As you can see below, the depth of the employee from the root has been added to each row of the resulting set.