|
ABSTRACT
We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
HEARST, MARTI A. 1993a. Cases as structured indexes for full-length documents. In Procee&ngs of the 1993 AAAI Spring Symposzum on Case-based Reasonzng and Information Retrieval, Stanford,CA.
|
| |
4
|
|
| |
5
|
|
| |
6
|
RABINEtt, LAWRENCE R. ~L RONALD W. SCHAFER. 1978. Digital processing of speech signals. New Jersey: Prentice-Hall, Inc.
|
| |
7
|
RO, JUNG SOON. 1988a. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of full-text retrieval. Journal of lhe Amemcan Soczety for Information b"czence, 39(2):73-78.
|
| |
8
|
Ro, JUNG SOON. 1988b. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of ranking algorithms on full-text retrieval. Journal of the Amcrzcan Society for Informatzon Science, 39(3):147-160.
|
| |
9
|
|
 |
10
|
|
| |
11
|
SALTON, GERARD & CHRIS BUCKLEY. 1991b. Global text matching for information retrieval. Sczence, 253:1012-1015.
|
| |
12
|
Craig Stanfill , David L. Waltz, Statistical methods, artificial intelligence, and information retrieval, Text-based intelligent systems: current research and practice in information extraction and retrieval, Lawrence Erlbaum Associates, Inc., Mahwah, NJ, 1992
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
CITED BY 78
|
|
|
|
|
|
|
|
Rila Mandala , Takenobu Tokunaga , Hozumi Tanaka, Combining multiple evidence from different types of thesaurus for query expansion, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.191-197, August 15-19, 1999, Berkeley, California, United States
|
|
|
|
|
|
|
|
|
Keishi Tajima , Yoshiaki Mizuuchi , Masatsugu Kitagawa , Katsumi Tanaka, Cut as a querying unit for WWW, Netnews, e-mail, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.235-244, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
Dongwook Shin , Sejin Nam , Munseok Kim, Hypertext construction using statistical and semantic similarity, Proceedings of the second ACM international conference on Digital libraries, p.57-63, July 23-26, 1997, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
Gerard Salton , Amit Singhal , Chris Buckley , Mandar Mitra, Automatic text decomposition using text segments and text themes, Proceedings of the the seventh ACM conference on Hypertext, p.53-65, March 16-20, 1996, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
Staffan Björk , Lars Erik Holmquist , Johan Redström , Ivan Bretan , Rolf Danielsson , Jussi Karlgren , Kristofer Franzén, WEST: a Web browser for small terminals, Proceedings of the 12th annual ACM symposium on User interface software and technology, p.187-196, November 07-10, 1999, Asheville, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. G. Brown , J. T. Foote , G. J. F. Jones , K. Sparck Jones , S. J. Young, Automatic content-based retrieval of broadcast news, Proceedings of the third ACM international conference on Multimedia, p.35-43, November 05-09, 1995, San Francisco, California, United States
|
|
|
Xiaoli Li , Tong-Heng Phang , Minqing Hu , Bing Liu, Using micro information units for internet search, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kazem Taghva , Thomas Nartker , Julie Borsack, Information access in the presence of OCR errors, Proceedings of the 1st ACM workshop on Hardcopy document processing, p.1-8, November 12-12, 2004, Washington, DC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Susan Price , Lois Delcambre , Marianne Lykke Nielsen , Timothy Tolle , Vibeke Luk , Mathew Weaver, Using semantic components to facilitate access to domain-specific documents in government settings, Proceedings of the 2006 international conference on Digital government research, May 21-24, 2006, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Uddin Md. Sharif , Elmarhomy Ghada , Elsayed Atlam , Masao Fuketa , Kazuhiro Morita , Jun-ichi Aoe, Improvement of building field association term dictionary using passage retrieval, Information Processing and Management: an International Journal, v.43 n.6, p.1793-1807, November, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Susan L. Price , Marianne Lykke Nielsen , Lois M. L. Delcambre , Peter Vedsted , Jeremy Steinhauer, Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective, Information Systems, v.34 n.8, p.778-806, December, 2009
|
|