Print

Writing PostgreSQL Functions with PL/pgSQL
Pages: 1, 2, 3

### Accessing Data

As a recursive function, the `fib()` example is none too fast. Of course it short-circuits when the argument is less than two, but otherwise its recursion can be quite slow. Why? Because each time you call it, it must calculate and return many of the same numbers. On my PowerBook, for example, it takes nearly 3.8 seconds to find the 26th Fibonacci number:

``````try=% explain analyze select fib(26);
QUERY PLAN
------------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=3772.062..3772.063 rows=1 loops=1)
Total runtime: 3772.315 ms
(2 rows)``````

Why does it take so long? Like any recursive Fibonacci function, it has to make 392,835 recursive calls to itself to calculate the 26th Fibonacci number. Those recursive calls add up! Because the function always returns the same values for the same arguments, it would really improve the performance of the function to memoize it. Memoization caches the results of a function call for a given set of arguments so that when the function is called again with the same arguments, it can simply return the value from the cache without recalculating it--in this case, minimizing the need for recursion.

PL/pgSQL itself has no ability to store data outside of a function, but this is database programming--take advantage of it! The trick is to create a table to function as the cache, then access it from the function. The new function, `fib_cached()`, does exactly that:

`````` 1   CREATE TABLE fib_cache (
2        num integer PRIMARY KEY,
3        fib integer NOT NULL
4   );
5
6   CREATE OR REPLACE FUNCTION fib_cached(
7       fib_for integer
8   ) RETURNS integer AS \$\$
9   DECLARE
10      ret integer;
11  BEGIN
12      if fib_for < 2 THEN
13          RETURN fib_for;
14      END IF;
15
16      SELECT INTO ret fib
17      FROM   fib_cache
18      WHERE  num = fib_for;
19
20      IF ret IS NULL THEN
21          ret := fib_cached(fib_for - 2) + fib_cached(fib_for - 1);
22          INSERT INTO fib_cache (num, fib)
23          VALUES (fib_for, ret);
24      END IF;
25      RETURN ret;
26
27  END;
28  \$\$ LANGUAGE plpgsql;``````

Lines 1-4 create the table for caching the Fibonacci sequence. The `num` column represents the sequence number for which the corresponding Fibonacci number is stored in the `fib` column. The `num` column is a primary key because it should be unique.

The `fib_cached()` function defined from lines 6-28 introduces quite a bit more syntax. The first line with something new is line five's `DECLARE` statement. As you may have ascertained by the previous discussion of argument aliases, this statement introduces a block for declaring variables for use in the function body. All variables used in the function but not declared in the argument signature must be declared here. You can give them a preliminary assignment using the PL/pgSQL assignment operator, `:=` (so named to avoid conflicts with the SQL `=` comparison operator). You can use any PostgreSQL data type for your variables, but this example again keeps things quite simple. There is a single integer variable, `ret`, which keeps track of a value for the function to return.

The `BEGIN` statement on line 11 ends the variable declaration block and starts the function body. Line 12 contains the familiar `IF-THEN` block that once again short-circuits the function if the argument to the function (stored in `fib_for`) is less than two. Then things get more interesting.

As shown in the `DECLARE` block, you can assign a value to a PL/pgSQL variable using `:=`, but what if you want to assign a value from a `SELECT` statement to a variable? Lines 16-18 demonstrate the approach. A variation of the standard `SELECT INTO` statement allows you to select values into one or more PL/pgSQL variables rather than into a table. A comma-delimited list of variables follows the `INTO` expression, while the rest of the statement remains a normal `SELECT` statement. There are a couple of caveats to `SELECT INTO` assignment, however: the `SELECT` statement must return no more than one row, and the selected columns must match the number and types of the variables.

Here it's relatively straightforward. The code looks in its cache (the `fib_cache` table) to see if it has already calculated and cached the Fibonacci number for the sequence number `fib_for`. The `SELECT` statement selects the `fib` column from `fib_cached` where the number is `fib_for` and stores the result in the `ret` variable.

Now, I mentioned that the `SELECT INTO` statement can return no more than one row, which also means that it can return zero rows. If this is the case, then the value of `ret` will be `NULL` in this function. Accordingly, line 20 begins an `IF-THEN` block by checking to see if the value of `ret` is indeed `NULL`. If it is, the function needs to calculate that Fibonacci number. Line 21 thus recursively calls `fib_cached()` just as `fib()` used recursion. Instead of returning the value, the code uses the `:=` assignment operator to store the value in `ret`.

With the new value calculated, the code needs to insert it into the cache so that it never has to be calculated again. Lines 22-23 do just that with a standard SQL `INSERT` statement. The variables `fib_for` and `ret` can be used right in the `INSERT` statement, just as `fib_for` was used in the `SELECT` statement at lines 16-18. One of the great features of PL/pgSQL is that it precompiles SQL statements embedded in it, using variables as if they were passed to a prepared SQL statement as arguments for placeholders. In other words, the `INSERT` statement magically becomes akin to:

``````PREPARE some_insert(integer, integer) AS
INSERT  INTO fib_cache (num, fib)
VALUES  (\$1, \$2);

EXECUTE some_insert(fib_for, ret);``````

The great thing about this feature of PL/pgSQL is that it makes embedded SQL statements extremely fast. The database can reuse the precompiled query plan for each call to the function (on a per-connection basis) without recompiling and planning.

At any rate, line 25 returns the value of `ret`, regardless of where it came from. Adding the caching support makes the function far faster:

``````try=% explain analyze select fib_cached(26);
QUERY PLAN
--------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=50.837..50.838 rows=1 loops=1)
Total runtime: 50.889 ms
(2 rows)

try=% explain analyze select fib_cached(26);
QUERY PLAN
------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=2.197..2.198 rows=1 loops=1)
Total runtime: 2.249 ms
(2 rows)``````

The first call to `fib_cached()` finds no cached values, and so it must create them all as it goes along. This simply means that it needs to calculate the values for each number up to 26 once, practically eliminating that recursion (indeed, `fib_cached()` has only 24 recursive calls, once each for the numbers 2 through 26). The result is a much faster query: only .05 seconds, as opposed to the nearly 3.8 seconds for `fib()`. Of course the second call to `fib_cached()` needs only to look up and return the 26th Fibonacci number, because the first call has already cached it. That cuts the time down to 0.0025 seconds. Not bad, eh?

 Pages: 1, 2, 3

-->