ACM Home Page
Please provide us with feedback. Feedback
Information shared by many objects
Full text PdfPdf (194 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: text mining table of contents
Pages 1213-1220  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Chong Long  Tsinghua University, Beijing, China
Xiaoyan Zhu  Tsinghua University, Beijing, China
Ming Li  University of Waterloo, Waterloo, ON, Canada
Bin Ma  University of Waterloo, Waterloo, ON, Canada
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 139,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458242
What is a DOI?

ABSTRACT

If Kolmogorov complexity [25] measures information in one object and Information Distance measures information shared by two objects, how do we measure information shared by many objects? This paper provides an initial pragmatic study of this fundamental data mining question. Firstly, Em(x1,x2,...,xn) is defined to be the minimum amount of thermodynamic energy needed to convert from any xi to any xj. With this definition several theoretical problems have been solved. Second, our newly proposed theory is applied to select a comprehensive review and a specialized review from many reviews: (1) Core feature words, expanded words and dependent words are extracted respectively. (2) Comprehensive and specialized reviews are selected according to the information among them. This method of selecting a single review can be extended to select multiple reviews as well. Finally, experiments show that this comprehensive and specialized review mining method based on our new theory can do the job efficiently.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Ané and M. Sanderson. Missing the forest for the trees: Phylogenetic compression and its implications for inferring complex evolutionary histories. Systematic Biology, 54(1):146--157, 2005.
 
2
T. Arbuchle, A. Balaban, D. Peters, and M. Lawford. Software documents: Comparison and measurement. In The Nineteenth International Conference on Software Engineering and Knowledge Engineering, July 2007.
 
3
D. Benedetto, E. Caglioti, and V. Loreto. Language trees and zipping. Physical Review Letters, 88(4):048702, 2002.
 
4
C. Bennett, P. Gacs, M. Li, P. Vitányi, and W. Zurek. Information distance. IEEE Transactions on Information Theory, 44(4):1407--1423, July 1998.
 
5
C. Bennett, M. Li, and B. Ma. Chain letters and evolutionary histories. Scientific American, 288(6):76--81, June 2003.
6
 
7
 
8
X. Chen, B. Francia, M. Li, B. Mckinnon, and A. Seker. Shared information and program plagiarism detection. IEEE Transactions on Information Theory, 50(7):1545--1550, July 2004.
 
9
 
10
R. Cilibrasi and P. Vitányi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523--1545, 2005.
 
11
 
12
 
13
M. C. de Marneffe, B. MacCartney, and C. D. Manning. Generating typed dependency parses from phrase structure parses. In The fifth international conference on Language Resources and Evaluation (LREC), May 2006.
 
14
K. Emanuel, S. Ravela, E. Vivant, and C. Risi. A combined statistical-deterministic approach of hurricane risk assessment. manuscript, Program in Atmospheres, Oceans, and Climate, MIT, 2005.
 
15
M. Gamon, A. Aue, S. C. Oliver, and E. Ringger. Pulse: Mining customer opinions from free text. In International Symposium on Intelligent Data Analysis (IDA), pages 121--132, October 2005.
 
16
M. Hayashida and T. Akutsu. Image compression-based approach to measuring the similarity of protein structures. In The 6th Asia-Pacific Bioinformatics Conference, pages 221--230, 2008.
17
18
 
19
S. Kirk and S. Jenkins. Information theory-based software metrics and obfuscation. Journal of Systems and Software, 72:179--186, 2004.
 
20
 
21
A. Kraskov, H. Stogbauer, R. Andrzejak, and P. Grassberger. Hierarchical clustering using mutual information. Europhys. Lett, 70(2):278--284, 2005.
 
22
 
23
M. Li, J. Badger, X. Chen, S. Kwong, P. Kearney, and H. Zhang. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17(2):149--154, 2001.
 
24
M. Li, X. Chen, X. Li, B. Ma, and P. Vitányi. The similarity metric. IEEE Transactions on Information Theory, 50(12):3250--3264, 2004.
 
25
 
26
 
27
 
28
M. Nykter, N. Price, M. Aldana, S. Ramsey, S. Kauffman, L. Hood, O. Yli-Harja, and I. Shmulevich. Gene expression dynamic in the macrophage exhibit criticality. PNAS, 105(6):1897--1900, 2008.
 
29
M. Nykter, N. Price, A. Larjo, T. Aho, S. Kauffman, O. Yli-Harja, and I. Shmulevich. Critical networks exhibit maximal information diversity in structure-dynamics relationships. Physical Review Letters, 100:058702 (1-4), 2008.
 
30
H. Otu and K. Sayood. A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 19(6):2122--2130, 2003.
 
31
H. Pao and J. Case. Computing entropy for ortholog detection. In International Conference on Computational Intelligence, December 2004.
 
32
D. Parry. Use of Kolmogorov distance identification of web page authorship, topic and domain. In Workshop on Open Source Web Inf. Retrieval, 2005.
 
33
 
34
S. Rahmati and J. Glasgow. Noise tolerance of universal similarity metric applied to protein contact maps comparison in two dimensions. manuscript, Queen Univ, 2008.
 
35
 
36
 
37
 
38
W. Taha, S. Crosby, and K. Swadi. A new approach to data mining for software design. manuscript, Rice Univ, 2006.
39
 
40
 
41
42
43


Collaborative Colleagues:
Chong Long: colleagues
Xiaoyan Zhu: colleagues
Ming Li: colleagues
Bin Ma: colleagues