| A graph-theoretic approach to extract storylines from search results |
| Full text |
Pdf
(162 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Seattle, WA, USA
SESSION: Research track papers
table of contents
Pages: 216 - 225
Year of Publication: 2004
ISBN:1-58113-888-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 113, Citation Count: 9
|
|
|
ABSTRACT
We present a graph-theoretic approach to discover storylines from search results. Storylines are windows that offer glimpses into interesting themes latent among the top search results for a query; they are different from, and complementary to, clusters obtained through traditional approaches. Our framework is axiomatically developed and combinatorial in nature, based on generalizations of the maximum induced matching problem on bipartite graphs. The core algorithmic task involved is to mine for signature structures in a robust graph representation of the search results. We present a very fast algorithm for this task based on local search. Experiments show that the collection of storylines extracted through our algorithm offers a concise organization of the wealth of information hidden beyond the first page of search results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
J-Y. Cai. On the impossibility of certain ranking functions. International Mathematical Journal, 3(2):119--128, 2003.
|
| |
4
|
T. Calishain. Clustering with search engines. http://www.llrx.com/features/clusteringsearch.htm.
|
| |
5
|
S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis, pages 13--21, 1998.
|
 |
6
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
7
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Fumas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
L. Stockmeyer and V. Vazirani. NP-Completeness of some generalizations of the maximum matching problem. Information Processing Letters, 15(1):14--19, 1982.
|
| |
13
|
N. Tishby, F.C. Pereira, and W. Blalek. The information bottleneck method. In Proc. 37th Annual Allerton Conference on Communication, Control, and Computing, pages 368--377, 1999.
|
| |
14
|
P. Tsaparas. Link Analysis Ranking Algorithms. PhD thesis, University of Toronto, 2003.
|
| |
15
|
|
| |
16
|
O. Zamir, O. Etzioni, O. Madani, and R. M. Karp. Fast and intuitive clustering of web documents. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pages 287--290, 1997.
|
CITED BY 9
|
|
|
Shizhu Liu , Yuval Merhav , Wai Gen Yee , Nazli Goharian , Ophir Frieder, A sentence level probabilistic model for evolutionary theme pattern mining from news corpora, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|
|
|
|
|
R. Guha , Ravi Kumar , D. Sivakumar , Ravi Sundaram, Unweaving a web of documents, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|