ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Efficient linear text segmentation based on information retrieval techniques
Full text PdfPdf (328 KB)
Source International Conference on Management of Emergent Digital EcoSystems archive
Proceedings of the International Conference on Management of Emergent Digital EcoSystems table of contents
France
SESSION: Knowledge representation, reasoning and discovery (KRRD) table of contents
Article No.: 25  
Year of Publication: 2009
ISBN:978-1-60558-829-2
Authors
Roman Kern  Know-Center, Graz
Michael Granitzer  Graz University of Technology, Graz
Sponsor
: The French Chapter of ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 21,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1643823.1643854
What is a DOI?

ABSTRACT

The task of linear text segmentation is to split a large text document into shorter fragments, usually blocks of consecutive sentences. The algorithms that demonstrated the best performance for this task come at the price of high computational complexity. In our work we present an algorithm that has a computational complexity of O(n) with n being the number of sentences in a document. The performance of our approach is evaluated against algorithms of higher complexity using standard benchmark data sets and we demonstrate that our approach provides comparable accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In In Proceedings of EMNLP, 2001.
 
5
F. Y. Y. Choi. Advances in domain independent linear text segmentation, 2000.
 
6
G. Dias and E. Alves. Unsupervised topic segmentation based on word co-occurrence and multi-word units for text summarization. In Proceedings of the ELECTRA Workshop associated to 28th ACM SIGIR Conference, Salvador, Brazil, pages 41--48, 2005.
 
7
S. T. Gries. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4):403--437.
 
8
 
9
M. A. Hearst. Multi-paragraph segmentation of expository text, 1994.
 
10
11
 
12
 
13
 
14
 
15
 
16
K. Richmond and A. Smith. Detecting subject boundaries within text: A language independent statistical approach. In Brown University, Providence, Rhode Island, pages 47--54, 1997.
17
18
 
19

Collaborative Colleagues:
Roman Kern: colleagues
Michael Granitzer: colleagues