In A virtuous cycle for analytics I noted that our library of Postgres functions is written in two languages: Postgres’ built-in pl/pgsql and the installable alternative pl/python. These share a common type system and can be used interchangeably.
Here’s a pl/pgsql classifier that tries to match the name of a course against a list of patterns that characterize the humanities.
create function humanities_classifier(course_name text) returns boolean as $$ begin return lower(course_name) ~ any(array[ 'psych', 'religio', 'soci' ]); end; $$ language plpgsql; # select humanities_classifier('Religious Studies 101') as match; match ----- t # select humanities_classifier('Comparative Religions 200') as match; match ----- t
Here is that same classifier in Python.
create function humanities_classifier(course_name text) returns boolean as $$ sql = f""" select lower('{course_name}') ~ any(array[ 'psych', 'religio', 'soci' ]) as match""" results = plpy.execute(sql) return results[0]['match'] $$ language plpython3u; # select humanities_classifier('Religious Studies 101') as match; match ----- t # select humanities_classifier('Comparative Religions 200') as match; match ----- t
The results are exactly the same. In this case, Python is only wrapping the SQL used in the orginal function and interpolating course_name into it. So why use pl/python here? I wouldn’t. The pl/pgsql version is cleaner and simpler because the SQL body doesn’t need to be quoted and course_name doesn’t need to be interpolated into it.
Here’s a more Pythonic version of the classifier.
create function humanities_classifier(course_name text) returns boolean as $$ import re regexes = [ 'psych', 'religio', 'soci' ] matches = [r for r in regexes if re.search(r, course_name, re.I)] return len(matches) $$ language plpython3u;
There’s no SQL here, this is pure Python. Is there any benefit to doing things this way? In this case probably not. The native Postgres idiom for matching a string against a list of regular expressions is cleaner and simpler than the Python technique shown here. A Python programmer will be more familiar with list comprehensions than with the Postgres any and ~ operators but if you’re working in Postgres you’ll want to know about those, and use them not just in functions but in all SQL contexts.
What about performance? You might assume as I did that a pl/pgsql function is bound to be way faster than its pl/python equivalent. Let’s check that assumption. This SQL exercises both flavors of the function, which finds about 500 matches in a set of 30,000 names.
with matching_courses as ( select humanities_classifier(name) as match from lms_course_groups ) select count(*) from matching_courses where match;
Here are the results for three runs using each flavor of the function:
pl/pgsql: 159ms, 201ms, 125ms pl/python: 290ms, 255ms, 300ms
The Python flavor is slower but not order-of-magnitude slower; I’ve seen cases where a pl/python function outperforms its pl/pgsql counterpart.
So, what is special about Python functions inside Postgres? In my experience so far there are three big reasons to use it.
Python modules
The ability to wield any of Python’s built-in or loadable modules inside Postgres brings great power. That entails great responsibility, as the Python extension is “untrusted” (that’s the ‘u’ in ‘plpython3u’) and can do anything Python can do on the host system: read and write files, make network requests.
Here’s one of my favorite examples so far. Given a set of rows that count daily or weekly annotations for users in a group — so for weekly accounting each row has 52 columns — the desired result for the whole group is the element-wise sum of the rows. That’s not an easy thing in SQL but it’s trivial using numpy, and in pl/python it happens at database speed because there’s no need to transfer SQL results to an external Python program.
Metaprogramming
Functions can write and then run SQL queries. It’s overkill for simple variable interpolation; as shown above pl/pgsql does that handily without the cognitive overhead and visual clutter of poking values into a SQL string. For more advanced uses that compose queries from SQL fragments, though, pl/pgsql is hopeless. You can do that kind of thing far more easily, and more readably, in Python.
Introspection
A pl/python function can discover and use its own name. That’s the key enabler for a mechanism that memoizes the results of a function by creating a materialized view whose name combines the name of the function with the value of a parameter to the function. This technique has proven to be wildly effective.
I’ll show examples of these scenarios in later installments of this series. For now I just want to explain why I’ve found these two ways of writing Postgres functions to be usefully complementary. The key points are:
– They share a common type system.
– pl/pgsql, despite its crusty old syntax, suffices for many things.
– pl/python leverages Python’s strengths where they are most strategic
When I began this journey it wasn’t clear when you’d prefer one over the other, or why it might make sense to use both in complementary ways. This installment is what I’d like to have known when I started.
—
1 https://blog.jonudell.net/2021/07/21/a-virtuous-cycle-for-analytics/
2 https://blog.jonudell.net/2021/07/24/pl-pgsql-versus-pl-python-heres-why-im-using-both-to-write-postgres-functions/
3 https://blog.jonudell.net/2021/07/27/working-with-postgres-types/
4 https://blog.jonudell.net/2021/08/05/the-tao-of-unicode-sparklines/
5 https://blog.jonudell.net/2021/08/13/pl-python-metaprogramming/
6 https://blog.jonudell.net/2021/08/15/postgres-and-json-finding-document-hotspots-part-1/
7 https://blog.jonudell.net/2021/08/19/postgres-set-returning-functions-that-self-memoize-as-materialized-views/
8 https://blog.jonudell.net/2021/08/21/postgres-functional-style/
9 https://blog.jonudell.net/2021/08/26/working-in-a-hybrid-metabase-postgres-code-base/
10 https://blog.jonudell.net/2021/08/28/working-with-interdependent-postgres-functions-and-materialized-views/
11 https://blog.jonudell.net/2021/09/05/metabase-as-a-lightweight-app-server/
12 https://blog.jonudell.net/2021/09/07/the-postgres-repl/
Thank you for this information! It was helpful and a good starting place to strategically using pl/python in postgres.
I should update that post, or add a new one in the series, to reflect something I’ve learned since. For a great many cases where I was using `language plpgsql` I could have more simply been using `language sql`.