While searching for the best way to write some backend methods I stumbled on an amazing, yet not so new feature of PostgreSQL . Since version 8.4 postgreSQL has supported Common Table Expressions (CTE) and recursion, which is useful for creating complex yet readable queries of hierarchical data. Straight from the postgreSQL docs: “Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query”.

An Example

the table company_members stores the relationships of a company’s organizational structure.

company_members

manager name job_title
Michael NULL CEO
Jan Michael CTO
Creed Michael COO
Dwight Creed Asst to the Regional Manager
Pam Jan secretary
Stanley Dwight sales

The following recursive query will list the company_members hierarchically:

WITH RECURSIVE get_company_rank AS (

    SELECT manager, name, job_title FROM  company_members WHERE job_title = 'CEO'

    UNION

    SELECT cm.name, cm.manager, cm.job_title FROM company_members cm

    INNER JOIN get_company_rank gcr ON gcr.name = cm.manager

)

SELECT * FROM get_company_rank;

The Query Explained

  • The WITH keyword creates the CTE
  • The RECURSIVE keyword allows the CTE to refer to its own output and allows you to define a base case and a recursive expression
  • The base case (above UNION) returns (Michael, CEO)
  • The recursive statement (below UNION) will continue to execute until no rows are returned

WARNING! : If there are any cycles in the table data you will recurse infinitely through the dataset. Below I will go into detail on adding a breakpoint to the recursive query

Adding Depth

If you do not want to select the entire table you will need to add a depth parameter to end the recursion after a certain depth is reached. The break point technique can also be used to end the query if an infinite loop has occurred by breaking at a certain number of iterations or by setting a timeout.

IMPORTANT! : The ordering of the parameters must match the ordering of the columns in the select statements. If you receive any “operator does not exist” errors that is most likely the cause.

WITH RECURSIVE get_company_rank(depth,name,manager) AS (

    SELECT 1, cm.name, cm.manager,cm.job_title FROM  company_members cm WHERE job_title = 'CEO'

    UNION

    SELECT (gcr.depth + 1), cm.name, cm.manager, cm.job_title FROM company_members cm, get_company_rank gcr

    WHERE cm.manager = gcr.name AND depth < 5

)

SELECT * FROM get_company_rank;

As you can see below, the depth of the employee from the root has been added to each row of the resulting set.