ACM Home Page
Please provide us with feedback. Feedback
A search engine for natural language applications
Full text PdfPdf (322 KB)
Source International World Wide Web Conference archive
Proceedings of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Semantic search table of contents
Pages: 442 - 452  
Year of Publication: 2005
ISBN:1-59593-046-9
Authors
Michael J. Cafarella  University of Washington, Seattle, WA
Oren Etzioni  University of Washington, Seattle, WA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 107,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1060745.1060811
What is a DOI?

ABSTRACT

Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries---they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability.In response, this paper introduces the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions. For example, in response to the query "powerful ‹noun›" BE will return all the nouns in its index that immediately follow the word "powerful", sorted by frequency. In response to the query "Cities such as ProperNoun(Head(‹NounPhrase›))", BE will return a list of proper nouns likely to be city names.BE's novel neighborhood index enables it to do so with O(k) random disk seeks and O(k) serial disk reads, where k is the number of non-variable terms in its query. As a result, BE can yield several orders of magnitude speedup for large-scale language-processing applications. The main cost is a modest increase in space to store the index. We report on experiments validating these claims, and analyze how BE's space-time tradeoff scales with the size of its index and the number of variable types. Finally, we describe how a BE-based application extracts thousands of facts from the Web at interactive speeds in response to simple user queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Corpus Colossal. The Economist, Jan. 2005.
2
 
3
 
4
5
 
6
 
7
 
8
E. Brill, J. Lin, M. Banko, S. T. Dumais, and A. Y. Ng. Data-Intensive Question Answering. In TREC 2001 Proceedings, 2001.
 
9
10
 
11
 
12
A. Y. Halevy and J. Madhavan. Corpus-Based Knowledge Representation. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1567--1572, 2003.
 
13
14
 
15
 
16
 
17
A. O. Mendelzon, G. A. Mihalia, and T. Milo. Querying the World Wide Web. International Journal on Digital Libraries, 1996.
 
18
R. C. Miller and B. C. Myers. Lightweight Structured Text Processing. In Proceedings of 1999 USENIX Annual Technical Conference, pages 131--144, Monterey, CA, 1999.
19
 
20
 
21
 
22
 
23
H. E. Williams, J. Zobel, and P. Anderson. What's Next? Index Structures for Efficient Phrase Querying. In J. Roddick, editor, Proceedings on the Australasian Database Conference, pages 141--152, Auckland, New Zealand, 1999.

CITED BY  17

Collaborative Colleagues:
Michael J. Cafarella: colleagues
Oren Etzioni: colleagues