|
ABSTRACT
This paper addresses the issue of Web document summarization. As textual content of Web documents is often scarce or irrelevant and existing summarization techniques are based on it, many Web pages and websites cannot be suitably summarized. We consider the context of a Web document by the textual content of all the documents linking to it. To summarize a target Web document, a context-based summarizer has to perform a preprocessing task, during which it will be decided which pieces of information in the source documents are relevant to the content of the target. Then a context-based summarizer faces two issues: first, the selected elements may partially deal with the topic of the target, second they may be related to the target and yet not contain any clues about the content of the target.In this paper we put forward two new summarization by context algorithms. The first one uses both the content and the context of the document and the second one is based only on the elements of the context. It is shown that summaries taking into account the context are usually much more relevant than those made only from the content of the target document. Optimal conditions of the proposed algorithms with respect to the sizes of the content and the context of the document to summarize are studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
G. Attardi, M. S. Di, and D. Salvi. Categorisation by Context. J.UCS: Journal of Universal Computer Science, 4(9):719--736, 1998.
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
 |
7
|
|
| |
8
|
J. Furnkranz. Using links for classifying Web-pages. Technical report, Austrian Research Institute for Artificial Intelligence, TR-OEFAI-98-29, 1998.
|
 |
9
|
Jade Goldstein , Mark Kantrowitz , Vibhu Mittal , Jaime Carbonell, Summarizing text documents: sentence selection and evaluation metrics, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.121-128, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312665]
|
| |
10
|
S. Johnson. Hierarchical clustering schemes. Psychometrika, 32:241--254, 1967.
|
| |
11
|
|
| |
12
|
F. Menczer. Links tell us about lexical and semantic Web content. Technical report, Computer Science, abstract CS.IR/0108004, arXiv.org, Aug 2001.
|
| |
13
|
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on Wordnet. Technical report, Cognitive Science Laboratory, Princeton University, 1990.
|
| |
14
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
 |
15
|
|
| |
16
|
H. Schmid. Probabilistic Part-of-Speech Tagging Using Decision Trees. In International Conference on New Methods in Language Processing, pages 44--49, Manchester, UK, Sep 1994.
|
| |
17
|
K. Sparck-Jones and J. Galliers. Evaluating Natural Language Processing Systems. Springer, 32:241--254, 1996.
|
 |
18
|
Mary Zajicek , Chris Powell , Chris Reeves, A Web navigation tool for the blind, Proceedings of the third international ACM conference on Assistive technologies, p.204-206, April 15-17, 1998, Marina del Rey, California, United States
[doi> 10.1145/274497.274534]
|
| |
19
|
Y. Zhang. World Wide Web Site Summarization, Master thesis. Technical report, Faculty of Computer Science, Dalhousie University, Apr 2002.
|
CITED BY 14
|
|
|
|
|
Jian-Tao Sun , Dou Shen , Hua-Jun Zeng , Qiang Yang , Yuchang Lu , Zheng Chen, Web-page summarization using clickthrough data, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaehui Park , Tomohiro Fukuhara , Ikki Ohmukai , Hideaki Takeda , Sang-goo Lee, Web content summarization using social bookmarks: a new approach for social summarization, Proceeding of the 10th ACM workshop on Web information and data management, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
I. V. Ramakrishnan , Jalal Mahmud , Yevgen Borodin , Muhammad Asiful Islam , Faisal Ahmed, Bridging the Web Accessibility Divide, Electronic Notes in Theoretical Computer Science (ENTCS), 235, p.107-124, April, 2009
|
|