Quality benchmarking relational databases and Lucene in the TREC4 adhoc task environment

Arslan A., YILMAZEL Ö.

Proceedings of the International Multiconference on Computer Science and Information Technology, IMCSIT 2010, vol.5, pp.365-372, 2010 (Scopus) identifier


The present work covers a comparison of the text retrieval qualities of open source relational databases and Lucene, which is a full text search engine library, over English documents. TREC-4 adhoc task is completed to compare both search effectiveness and search efficiency. Two relational database management systems and four different well-known English stemming algorithms have been tried. It has been found that language specific preprocessing improves retrieval quality for all systems. The results of the English text retrieval experiments by using Lucene are at par with top six results presented at TREC-4 automatic adhoc. Although open source relational databases integrated full text retrieval technology, their relevancy ranking mechanisms are not as good as Lucene's. © 2010 IEEE.