ACM Home Page
Please provide us with feedback. Feedback
OCELOT: a system for summarizing Web pages
Full text PdfPdf (1.19 MB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 144 - 151  
Year of Publication: 2000
ISBN:1-58113-226-3
Authors
Adam L. Berger  School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Vibhu O. Mittal  Just Research, 4616 Henry Street, Pittsburgh, PA
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 103,   Citation Count: 35
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345565
What is a DOI?

ABSTRACT

We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in both structure and content. Instead of coherent text with a well-defined discourse structure, they are more often likely to be a chaotic jumble of phrases, links, graphics and formatting commands. Such text provides little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous, coherent span of text from it. This paper builds upon recent work in non-extractive summarization, producing the gist of a web page by “translating” it into a more concise representation rather than attempting to extract a text span verbatim. OCELOT uses probabilistic models to guide it in selecting and ordering words into a gist. This paper describes a technique for learning these models automatically from a collection of human-summarized web pages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Berger, A., and Lafferty, J. The Weaver system for document retrieval. In Proceedings of the Eighth Text REtrieval Conference (TREC-8) (1999).
 
3
 
4
Clarkson, P., and Rosenfeld, R. Statistical language modeling using the CMU-Cambddge toolkit. In Proceedings of Eurospeech '97 (1997).
 
5
DeJong, G. F. An overview of the FRUMP system. In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle, Eds. Lawrence Erlbaum Associates, 1982, pp. 149-176.
6
 
7
Fomey, G. D. The Viterbi Algorithm. Proceedings of the IEEE (1973), 268-278.
8
 
9
Good, I. The population frequencies of species and the estimation of population parameters. Biometrika 40 (1953).
 
10
Hand, T. E A proposal for task-based evaluation of text summarization systems. In ACL/EACL-97 Workshop on Intelligent Scalable Text Summarization (July 1997), pp. 31-36.
 
11
 
12
Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. Summarization evaluation methods experiments and analysis. In AAAI Intelligent Text Summarization Workshop (Mar. 1998), pp. 60-68.
 
13
Luhn, R H. Automatic creation of literature abstracts. IBM Journal (1958), 159-165.
 
14
Marcu, D. From discourse structures to text summaries. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization (1997), pp. 82-88.
 
15
Mathis, B. A., Rush, J. E., and Young, C. E. Improvement of automatic abstracts by the use of structural analysis. JA- SIS 24 (1973), 101-109.
 
16
Nathan, K., Beigi, H., Subrahmonia, J., Clary, G., and Maruyama, H. Real-time on-line unconstrained handwriting recognition using statistical methods. In Proceedings of the 1EEE ICASSP-95 Conference (1995).
 
17
The Open Directory project: http : //draoz. org.
18
 
19
20

CITED BY  35

Collaborative Colleagues:
Adam L. Berger: colleagues
Vibhu O. Mittal: colleagues