ACM Home Page
Please provide us with feedback. Feedback
Information retrieval from digital libraries in SQL
Full text PdfPdf (1.02 MB)
Source
Workshop On Web Information And Data Management archive
Proceeding of the 10th ACM workshop on Web information and data management table of contents
Napa Valley, California, USA
SESSION: System issues table of contents
Pages 55-62  
Year of Publication: 2008
ISBN:978-1-60558-260-3
Authors
Carlos Garcia-Alvarado  University of Houston, Houston, TX, USA
Carlos Ordonez  University of Houston, Houston, TX, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 127,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458502.1458512
What is a DOI?

ABSTRACT

Information retrieval techniques have been traditionally exploited outside of relational database systems, due to storage overhead, the complexity of programming them inside the database system, and their slow performance in SQL implementations. This project supports the idea that searching and querying digital libraries with information retrieval models in relational database systems can be performed with optimized SQL queries and User-Defined Functions. In our research, we propose several techniques divided into two phases: storing and retrieving. The storing phase includes executing document pre-processing, stop-word removal and term extraction, and the retrieval phase is implemented with three fundamental IR models: the popular Vector Space Model, the Okapi Probabilistic Model, and the Dirichlet Prior Language Model. We conduct experiments using article abstracts from the DBLP bibliography and the ACM Digital Library. We evaluate several query optimizations, compare the on-demand and the static weighting approaches, and we study the performance with conjunctive and disjunctive queries with the three ranking models. Our prototype proved to have linear scalability and a satisfactory performance with medium-sized document collections. Our implementation of the Vector Space Model is competitive with the two other models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Driscoll D.A. Grossman. Structuring text within a relational system. DEXA '92, September 1992.
2
3
4
 
5
 
6
D.A. Grossman, D.O. Holmes, and O. Frieder. A parallel DBMS approach to ir in TREC-3. In Text REtrieval Conference, 1994.
 
7
8
 
9
M. Porter K. Lubell. Porter algorithm T-SQL. Url: http://tartarus.org/ martin/PorterStemmer/tsql.txt, May 2006.
10
 
11
D.A. Grossman O. Frieder M.C. McCabe, D. Holmes. Parallel platform-independent implementation of information retrieval algorithms. PDPTA'00, 2000.
 
12
A.K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
 
13
A.K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
14
15
 
16
17
18
 
19
 
20
G. Weikum. DB&IR: Both sides now. pages 25--30, 2007.
21


Collaborative Colleagues:
Carlos Garcia-Alvarado: colleagues
Carlos Ordonez: colleagues