Programming with Spotlight
Subject:   spotlight programming
Date:   2005-07-19 18:24:03
From:   ptwobrussell
I haven't seen much written about the actual search algorithm used in Spotlight -- although I haven't scoured any sources for any such thing either. From what I can gather, however, it's search appears to just be your simple boolean logic queries, and by default, it looks like it might be using "AND" as the operator with an enourmous distance between index words. For example, I created a file like this with 1,002 words in it:

peanut aaa aaa aaa aaa aaa ...(1,000 times total)... aaa butter

Ok, so when i search in spotlight for the phrase "peanut butter" (no quotes around it), the test document I created is indeed found -- This would confirm the thought that "AND" is the default operator unless there's some more sophisticated proprietary black magic going on (but as an application developer, that's out of our hands anyway, so it's sort of irrelevant). If you search for "peanut butter" with the quotes around it, however, Spotlight doesn't find it in the test document I created, as we would expect since it's searching for an exact phrase.

Just as an FYI: if you do the same thing, but have 100,000 "aaa"'s between "peanut" and "butter", it still finds "peanut butter" (no quotes) without any issues. Is there possibly some summarization or other stuff going on in there? I don't know. Try inserting 100,000 randomly generated character sequences instead of "aaa" and see if you get anything any different.

Here's a link with some interesting Spotlight discussion that touches on search implementation issues, ever so slightly: