|
|||||||||||||||||||
|
|||||||||||||||||||
ABSTRACT
Traditional keyword search---where a query is a list of keywords and query results are a relevance-ordered list of documents---is, of course, a powerful query paradigm for text databases and the Web. However, more expressive query paradigms, where both queries and their results can exhibit a richer structure than in traditional keyword search, are often desirable. Information extraction systems identify and extract intrinsically structured data that is embedded in natural-language text documents, hence enabling these alternative query paradigms. Unfortunately, information extraction is a time-consuming process, often involving complex text analysis, so exhaustively processing all documents in a large text database --or on the Web-- could be prohibitively expensive. Beyond efficiency, query result quality is also important: information extraction is error-prone and not all extracted data is equally likely to be correct, so result quality is an important consideration during query processing. In this talk, I will discuss recent work on cost-based optimization of structured queries in this information extraction scenario, where modeling query result quality--in addition to execution efficiency-- is a distinctive and important challenge. INDEX TERMS
Primary Classification:
Additional Classification:
|
|||||||||||||||||||