Search |
Ranking and SQL... incompatible worlds?In intergeo, we recently realized that ranking is not for free in SQL databases… not a new story… just an important fact which might influence the choice of tools! Let us start with a simple example… I want to be able to search for apple in a database of documents. So my “intelligent query engine”, when asked for apple will also look for documents about fruit but… with less high rank. In Lucene, that would expand into a query such as Another example for intergeo: when searching for such a concept as the latin-thales-theorem (e.g. by saying “Thalès” as a french user, or by choosing a corresponding chapter in a book) (this theorem is called the intercepting lines theorem in English), we need to also match resources that are annotated with the concept of enlargement (dilation, homothétie, etc) but with a weaker rank. Thus, if the user is searching under a language which has few resources about this theorem, they would be shown first with the resources about enlargement shown next, all this only a page at a time. This is, to my taste, a very important form of fuzziness which has made the retrieval paradigm of web-based search engines successful. This form of matching is not text-based matching, it is really a symbolic match. But text-search-engines apply such ranking algorithms routinely: not only do they compute score, they also sort the results by score. My proposal for intergeo is that our search engine should do the same. I have asked a database person about it and he agreed that SQL databases do not do the necessary ranking. Some tools build retrieval engines on top of databases… that’s what does TopX among others… but that is a big programme in its own which takes the time to decompose queries individually and match each part at the SQL level, already properly arranged… so, unless we use (or make?) such a retrieval tool, there is no chance that an SQL engine is of use for the promised cross-curriculum search of intergeo. thoughts welcome. Trackback URL for this post:http://eds.activemath.org/en/trackback/191
|
Relational DB with XML capabilities?
Are you looking for something along the lines of IBM® DB2® 9 (formerly known as “Viper”), a relational database that also includes XML capabilities?
Albert
no... just ranked results
Albert,
whether it’s xml under hood or not is irrelevant for search, I think.
DB2 is not open-source, as far as I know, it’s free as free beer (which means that if an update has broken it, we’re kind of dead). Its XML support is long known, though.
paul