ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button
Article:
  Introduction to Text Indexing with Apache Jakarta Lucene
Subject:   How about more "fuzzy" operators and proximity weighting
Date:   2003-03-06 16:27:26
From:   anonymous2
Maybe a mispelling / typo operator, or maybe soudex, or maybe a regex match.


I guess most operators would require the each token be reduced to some type of "hash code" and then that hash code would be stored in a separate field. Then the search would has query terms and check the hash-codes field.


But some operators would seem difficult to do if a source word could not be directly mapped to a single "hash code". For example, a regex match.


Also, would be nice to boost documents where two matching words are closer together.