| Efficient linear text segmentation based on information retrieval techniques |
| Full text |
Pdf
(328 KB)
|
| Source
|
International Conference on Management of Emergent Digital EcoSystems
archive
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
table of contents
France
SESSION: Knowledge representation, reasoning and discovery (KRRD)
table of contents
Article No.: 25
Year of Publication: 2009
ISBN:978-1-60558-829-2
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 21, Citation Count: 0
|
|
|
ABSTRACT
The task of linear text segmentation is to split a large text document into shorter fragments, usually blocks of consecutive sentences. The algorithms that demonstrated the best performance for this task come at the price of high computational complexity. In our work we present an algorithm that has a computational complexity of O(n) with n being the number of sentences in a document. The performance of our approach is evaluated against algorithms of higher complexity using standard benchmark data sets and we demonstrate that our approach provides comparable accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In In Proceedings of EMNLP, 2001.
|
| |
5
|
F. Y. Y. Choi. Advances in domain independent linear text segmentation, 2000.
|
| |
6
|
G. Dias and E. Alves. Unsupervised topic segmentation based on word co-occurrence and multi-word units for text summarization. In Proceedings of the ELECTRA Workshop associated to 28th ACM SIGIR Conference, Salvador, Brazil, pages 41--48, 2005.
|
| |
7
|
S. T. Gries. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4):403--437.
|
| |
8
|
|
| |
9
|
M. A. Hearst. Multi-paragraph segmentation of expository text, 1994.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
K. Richmond and A. Smith. Detecting subject boundaries within text: A language independent statistical approach. In Brown University, Providence, Rhode Island, pages 47--54, 1997.
|
 |
17
|
|
 |
18
|
Bingjun Sun , Prasenjit Mitra , C. Lee Giles , John Yen , Hongyuan Zha, Topic segmentation with shared topic detection and alignment of multiple documents, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277778]
|
| |
19
|
|
|