I love subselects / subqueries

by Derek Sivers

Related link: http://www.postgresql.org/docs/7.2/static/queries.html



I spent so long in MySQL without the option of subselects - that I got so used to JOINing tables as the only way of doing things. Tonight (in PostgreSQL) I replaced a JOIN query with the sublime power of subselects.

PREVIOUS:
SELECT DISTINCT items.id, items.cache_sold
FROM item_subgenre_links isl
INNER JOIN catalogs_items ci ON isl.item_id=ci.item_id
INNER JOIN items ON isl.item_id=items.id
INNER JOIN subgenres ON isl.subgenre_id=subgenres.id
WHERE ci.catalog_id=1
AND subgenres.genre_id = 9
ORDER BY cache_sold DESC LIMIT 10;

SUBSELECT:
SELECT items.id, items.cache_sold
FROM items WHERE id IN (SELECT item_id
FROM catalogs_items
WHERE catalog_id = 1
AND item_id IN (SELECT item_id
FROM item_subgenre_links isl
WHERE isl.subgenre_id IN (SELECT id
FROM subgenres WHERE genre_id = 9)))
ORDER BY cache_sold DESC LIMIT 10;

As far as my non-developer-brain understands it, the reason that the subselect approach is more efficient is that you're limiting the available choices first, instead of joining all four tables and finding the intersection.

3 Comments

nyamada
2005-10-22 04:56:47
Where exists is faster than where in in Postgresql
I would be tempted to say that

select id, cache_sold
from items i
WHERE exists
(select * from
catalogs_items c
where catalog_id=1
and items_id = i.items_id
and exists (SELECT *
FROM item_subgenre_links isl
WHERE id = c.items_id
and exists (SELECT id
FROM subgenres
WHERE id = isl.items_id and genre_id = 9)))
ORDER BY cache_sold DESC LIMIT 10;


might be nominally faster, just because EXISTS is usually faster than IN in postgresql.


But I'm really wondering if you're right to use so many subselects. With proper indexing, it seems to me the join way is easier to read and easier to maintain.


EXPLAIN and EXPLAIN ANALYZE are invaluable tools in postgresql to tune your queries.



RobbyRussell
2005-10-22 08:10:34
Agreed
It's not to have sub-selects. I'd be curious on how your query runs in a JOIN, IN, and EXISTS. You should post some stats from your big db. :-)
xzilla
2005-10-25 12:11:02
I feel skeptical about this
As stated above, istm that using explicit joins should be just as fast and easier to read, I'd love to see some EXPLAIN ANALYZE output. I am guessing either you need more frequent analyze/vacuum or you have mismatched datatypes (or something). Potentially pushing some of your where clause into the join clause could help but something seems fishy here.


BTW this isn't because your non-developer brain is misunderstanding anything, it's just that I think the planner should be able to handle your first query a little better.


On the third hand though...ISTR Fabian Pascal telling me one time that any query that requires DISTINCT to work is fndementally incorrect. Of course he's kind of an abslutest, but I do approach distinct queries skeptically, I wonder if the whole query could be rewritten using joins and subqueries combined to be even faster.