| Computing block importance for searching on web sites |
| Full text |
Pdf
(467 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
table of contents
Lisbon, Portugal
SESSION: Web retrieval I (IR)
table of contents
Pages 165-174
Year of Publication: 2007
ISBN:978-1-59593-803-9
|
|
Authors
|
|
David Fernandes
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Edleno S. de Moura
|
Federal University of Amazonas, Manaus, Brazil
|
|
Berthier Ribeiro-Neto
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
Altigran S. da Silva
|
Federal University of Amazonas, Manaus, Brazil
|
|
Marcos André Gonçalves
|
Federal University of Minas Gerais, Belo Horizonte, Brazil
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 102, Citation Count: 1
|
|
|
ABSTRACT
In this paper we consider the problem of using the block structure of a Web page to improve ranking results when searching for information on Web sites. Given the block structure of the Web pages as input, we propose a method for computing the importance of each block (in the form of block weights) in a Web collection. As we show through experiments, the deployment of our method may allow a significant improvement in the quality of search results. We ran experiments to compare the quality of search results when using our method to the quality obtained when using no structure information. When compared to a ranking method that considered pages as monolithic units, our block-based ranking method led to improvements in the quality of search results in experiments with two sites with heterogeneous structures. Further, our method does not increase the cost of processing queries when compared to the systems using no structural information.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Ahnizeret, D. Fernandes, J. M. B. Cavalcanti, E. S. de Moura, and A. Silva. Information retrieval aware web site modelling and generation. In ER '04: Proceedings of the 23th Internacional Conference on Conceptual Modeling, pages 402--419, Shangai, China, 2004.
|
 |
2
|
|
 |
3
|
|
| |
4
|
D. Cai, S. Yu, J. Wen, and W. Ma. Vips: a vision based page segmentation algorithm. Technical Report MSR-TR-2003-79, Microsoft Technical Report, 2003.
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
Soumen Chakrabarti , Mukul Joshi , Vivek Tawde, Enhanced topic distillation using text, markup tags, and hyperlinks, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.208-216, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383990]
|
 |
9
|
|
| |
10
|
W. ching Wong and A. W.-C. Fu. Finding structure and characteristics of web documents for classification. In SIGMOD '2000: Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pages 96--105, Dallas, TX., USA, 2000. ACM Press.
|
| |
11
|
D. Hawking and N. Craswell. Overview of TREC-7 very large collection track. In Proceedings of the 7th Text Retrieval Conference, pages 91--104, Gaithersburg, MD, 1998.
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Ruihua Song , Haifeng Liu , Ji-Rong Wen , Wei-Ying Ma, Learning block importance models for web pages, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988700]
|
 |
17
|
Karane Vieira , Altigran S. da Silva , Nick Pinto , Edleno S. de Moura , João M. B. Cavalcanti , Juliana Freire, A fast and robust method for web page template detection and removal, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183654]
|
 |
18
|
|
 |
19
|
|
|