|
ABSTRACT
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the retrieval precision for the queries that generate irrelevant results. We believe that by reducing the number of irrelevant results; the users are encouraged to go back to a given site to search. Our experimental results on several different web sites and on the whole cnnfn collection demonstrate the feasibility of our approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
Xiaoli Li , Tong-Heng Phang , Minqing Hu , Bing Liu, Using micro information units for internet search, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584885]
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
L. Su,, H. Chen,, and X. Dong, "Evaluation of Web-based search engines from the end-user's perspective: a pilot study", Proc. of the Conf. for the American Soc. for Inf. Science, 1998.
|
| |
12
|
S.E. Robertson, S. Walker and M. Beaulieu. "Okapi at TREC-7: automatic ad hoc, filtering, and interactive", Proceedings of the Seventh Text REtrieval Conference (TREC-7), 1999.
|
CITED BY 5
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Evaluation of filtering current news search results, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
Yu Wang , Bingxing Fang , Xueqi Cheng , Li Guo , Hongbo Xu, Incremental web page template detection, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
LR Parsing
ACM Computing Surveys (CSUR)
6, 2
A. V. Aho
, S. C. Johnson
-
Web Services: Promises and Compromises
Queue
1, 1
Joanne Martin
, Ali Arsanjani
, Peri Tarr
, Brent Hailpern
-
Case study: medical web service for the automatic 3D documentation for neuroradiological diagnosis
Proceedings of the conference on Visualization '01
S. Iserhardt-Bauer
, P. Hastreiter
, T. Ertl
, K. Eberhardt
, B. Tomandl
-
|