ACM Home Page
Please provide us with feedback. Feedback
Experiences in crawling deep web in the context of local search
Full text PdfPdf (375 KB)
Source
Workshop On Geographic Information Retrieval archive
Proceeding of the 2nd international workshop on Geographic information retrieval table of contents
Napa Valley, California, USA
SESSION: Geographic references and web crawling table of contents
Pages 35-42  
Year of Publication: 2008
ISBN:978-1-60558-253-5
Authors
Dheerendranath Mundluru  Local.com Corporation, Irvine, CA, USA
Xiongwu Xia  Local.com Corporation, Irvine, CA, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 219,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1460007.1460016
What is a DOI?

ABSTRACT

Local search engines allow geographically constrained searching of businesses and their products or services. Some of the local search engines use crawlers for indexing Web page contents. These crawlers mostly index Web pages that are accessible through hyperlinks and which include desirable location information. It is extremely important for local search engines to also crawl additional high-quality "local" content (e.g., user reviews) that is available in the Deep Web. Much of this content is hidden behind search forms and is in the form of structured data, which is increasing very rapidly. In this paper, we present our experiences in crawling and extracting a wide variety of local structured data from large number of Deep Web resources. We discuss the challenges in crawling such sources and based on our experience we offer some effective principles to address them. Our experimental results on several Deep Web sources with local content show that the techniques discussed are highly effective.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bergman, M. 2001. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing, 7, 1 (2001).
2
3
 
4
 
5
Madhavan, J., Halevy, A. Y., Cohen, S., Dong, X. L., Jeffery, S. R., Ko, D., and Yu, C. 2006. Structured Data Meets the Web: A Few Observations. IEEE Data Eng. Bull. 29, 4 (2006). 19--26.
 
6
Mundluru, D. 2008. Automatically Constructing Wrappers for Effective and Efficient Web Information Extraction. PhD thesis (2008), University of Louisiana at Lafayette (In Preparation).
7
 
8
9

Collaborative Colleagues:
Dheerendranath Mundluru: colleagues
Xiongwu Xia: colleagues