ACM Home Page
Please provide us with feedback. Feedback
Mining multi-faceted overviews of arbitrary topics in a text collection
Full text PdfPdf (409 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 497-505  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Xu Ling  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Qiaozhu Mei  University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Bruce Schatz  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 337,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401952
What is a DOI?

ABSTRACT

A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and requires training examples for each facet. This has three limitations: (1) All facets are predefined, which may not fit the need of a particular user. (2) Training examples for each facet are often unavailable. (3) Such an approach only works for a predefined type of topics. In this paper, we break these limitations and study a more realistic new setup of the problem, in which we would allow a user to flexibly describe each facet with keywords for an arbitrary topic and attempt to mine a multi-faceted overview in an unsupervised way. We attempt a probabilistic approach to solve this problem. Empirical experiments on different genres of text data show that our approach can effectively generate a multi-faceted overview for arbitrary topics; the generated overviews are comparable with those generated by supervised methods with training examples. They are also more informative than unstructured flat summaries. The method is quite general, thus can be applied to multiple text mining tasks in different application domains.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
6
7
8
 
9
 
10
Kullback, S. and Leibler, R. A. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79--86, mar 1951.
 
11
X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai, and B. Schatz. Automatically generating gene summaries from biomedical literature. In Proceedings of PSB '06, pages 41--50, 2006.
 
12
13
14
 
15
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997.
16
17
18
 
19
R. M. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. pages 355--368, 1999.
20
 
21
M. A. C. R. A. Drysdale and T. F. Consortium. Flybase: genes and gene models. Nucleic Acids Res., 33:390--395, 2005.
 
22
E. Stoica, M. Hearst, and M. Richardson. Automating creation of hierarchical faceted metadata structures. In Proceedings of NAACL/HLT '2007, pages 244--251, 2007.
 
23
24
 
25
26
27
 
28
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.


Collaborative Colleagues:
Xu Ling: colleagues
Qiaozhu Mei: colleagues
ChengXiang Zhai: colleagues
Bruce Schatz: colleagues