pl/python metaprogramming

In episode 2 I mentioned three aspects of pl/python that are reasons to use it instead of pl/pgsql: access to Python modules, metaprogramming, and introspection.

Although this episode focuses on metaprogramming — by which I mean using Python to dynamically compose and run SQL queries — my favorite example combines all three aspects.

The context for the example is an analytics dashboard with a dozen panels, each driven by a pl/plython function that’s parameterized by the id of a school or a course. So, for example, the Questions and Answers panel on the course dashboard is driven by a function, questions_and_answers_for_group(group_id), which wraps a SQL query that:

– calls another pl/python function, questions_for_group(group_id), to find notes in the group that contain question marks

– finds the replies to those notes

– builds a table that summarizes the question/answer pairs

Here’s the SQL wrapped by the questions_and_answers_for_group(group_id) function.

sql = f"""
  with questions as ( 
    select *
    from questions_for_group('{_group_id}')
  ids_and_refs as (
      unnest ("references") as ref
      from annotation
      where groupid = '{_group_id}'
  combined as (
      array_agg( as reply_ids
    from ids_and_refs ir
    inner join questions q on = ir.ref
    group by, q.url, q.title, q.questioner, q.question, q.quote
  unnested as (
      unnest(reply_ids) as reply_id
    from combined c
  select distinct
    course_for_group('{_group_id}') as course,
    teacher_for_group('{_group_id}') as teacher,
    clean_url(u.url) as url,
    (regexp_matches(u.question, '.+\?'))[1] as question,
    display_name_from_anno(u.reply_id) as answerer,
    text_from_anno(u.reply_id) as answer,
    app_host() || '/course/render_questions_and_answers/{_group_id}' as viewer
  from unnested u
  order by course, teacher, url, title, questioner, question

This isn’t yet what I mean by pl/python metaprogramming. You could as easily wrap this SQL code in a pl/pgsql function. More easily, in fact, because in pl/pgsql you could just write _group_id instead of '{_group_id}'.

To get where we’re going, let’s zoom out and look at the whole questions_and_and_answer_for_group(group_id) function.

questions_and_answers_for_group(_group_id text)
  returns setof question_and_answer_for_group as $$
  from plpython_helpers import (
  base_view_name = get_caller_name()
  view_name = f'{base_view_name}_{_group_id}'
  if exists_group_view(plpy, view_name):
    sql = f""" select * from {view_name} """
    sql = f"""
    memoize_view_name(sql, view_name)
    sql = f""" select * from {view_name} """
  return plpy.execute(sql)
$$ language plpython3u;

This still isn’t what I mean by metaprogramming. It introduces introspection — this is a pl/python function that discovers its own name and works with an eponymous materialized view — but that’s for a later episode.

It also introduces the use of Python modules by pl/python functions. A key thing to note here is that this is an example of what I call a memoizing function. When called it looks for a materialized view that captures the results of the SQL query shown above. If yes, it only needs to use a simple SELECT to return the cached result. If no, it calls memoize_view_name to run the underlying query and cache it in a materialized view that the next call to questions_and_answers_for_group(group_id) will use in a simple SELECT. Note that memoize_view_name is a special function that isn’t defined in Postgres using CREATE FUNCTION foo() like a normal pl/python function. Instead it’s defined using def foo() in a Python module called plpython_helpers. The functions there can do things — like create materialized views — that pl/python functions can’t. More about that in another episode.

The focus in this episode is metaprogramming, which is used in this example to roll up the results of multiple calls to questions_and_answers_for_group(group_id). That happens when the group_id refers to a course that has sections. If you’re teaching the course and you’ve got students in a dozen sections, you don’t want to look at a dozen dashboards; you’d much rather see everything on the primary course dashboard.

Here’s the function that does that consolidation.

create function consolidated_questions_and_answers_for_group(group_id text)
  returns setof question_and_answer_for_group as $$
  from plpython_helpers import (
  base_view_name = get_caller_name()
  sql = sql_for_consolidated_and_memoized_function_for_group(
    plpy, base_view_name, 'questions_and_answers_for_group', _group_id)
  sql += ' order by course, url, title, questioner, answerer'
  return plpy.execute(sql)
$$ language plpython3u;

This pl/python function not only memoizes its results as above, it also consolidates results for all sections of a course. The memoization happens here.

def sql_for_consolidated_and_memoized_function_for_group(plpy, 
    base_view_name, function, group_id):
  view_name = f'{base_view_name}_{group_id}'
  sql = f""" select exists_view('{view_name}') as exists """
  exists = row_zero_value_for_colname(plpy, sql, 'exists')
  if exists:
    sql = f""" select * from {view_name} """
    sql = consolidator_for_group_as_sql(plpy, group_id, function)
    memoize_view_name(sql, view_name)
    sql = f""" select * from {view_name} """
  return sql

The consolidation happens here, and this is finally what I think of as classical metaprogramming: using Python to compose SQL.

def consolidator_for_group_as_sql(plpy, _group_id, _function):
  sql = f"select type_for_group('{_group_id}') as type"
  type = row_zero_value_for_colname(plpy, sql, 'type')
  if type == 'section_group' or type == 'none':
    sql = f"select * from {_function}('{_group_id}')"
  if type == 'course_group' or type == 'course':
    sql = f"select has_sections('{_group_id}')"
    has_sections = row_zero_value_for_colname(plpy, sql, 'has_sections')
    if has_sections:
      sql = f"""
        select array_agg(group_id) as group_ids 
        from sections_for_course('{_group_id}')
      group_ids = row_zero_value_for_colname(plpy, sql, 'group_ids')
      selects = [f"select * from {_function}('{_group_id}') "]
      for group_id in group_ids:
        selects.append(f"select * from {_function}('{group_id}')")
      sql = ' union '.join(selects)
      sql = f"select * from {_function}('{_group_id}')"
  return sql

If the inbound _groupid is p1mqaeep, the inbound _function is questions_and_answers_for_group, and the group has no sections, the SQL will just be select * from questions_and_answers_for_group('p1mqaeep').

If the group does have sections, then the SQL will instead look like this:

select * from questions_and_answers_for_group('p1mqaeep')
select * from questions_and_answers_for_group('x7fe93ba')
select * from questions_and_answers_for_group('qz9a4b3d')

This is a very long-winded way of saying that pl/python is an effective way to compose and run arbitarily complex SQL code. In theory you could do the same thing using pl/pgsql, in practice it would be insane to try. I’ve entangled the example with other aspects — modules, introspection — because that’s the real situation. pl/python’s maximal power emerges from the interplay of all three aspects. That said, it’s a fantastic way to extend Postgres with user-defined functions that compose and run SQL code.


2 thoughts on “pl/python metaprogramming

Leave a Reply