ACM Home Page
Please provide us with feedback. Feedback
Discovering bucket orders from full rankings
Full text PdfPdf (324 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 2: Ranking table of contents
Pages 55-66  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Jianlin Feng  Huazhong University of Science and Technology, Wuhan, China
Qiong Fang  The Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Wilfred Ng  The Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 214,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376625
What is a DOI?

ABSTRACT

Discovering a bucket order B from a collection of possibly noisy full rankings is a fundamental problem that relates to various applications involving rankings. Informally, a bucket order is a total order that allows "ties" between items in a bucket. A bucket order B can be viewed as a "representative" that summarizes a given set of full rankings {T1, T2, ..., Tm}, or conversely B can be an "approximation" of some "ground truth" G where the rankings {T1, T2, ..., Tm} are simply the "linear extensions" of G.

Current work of finding bucket orders such as the dynamic programming algorithm is mainly developed from the "representative" perspective, which maximizes items' intra-bucket similarity when forming a bucket. The underlying idea of maximizing intra-bucket similarity is realized via minimizing the sum of the deviations of median ranks within a bucket. In contrast, from the "approximation" perspective, since each observed full ranking Ti is simply a linear extension of the given "ground truth" bucket order G, items in a big bucket b in G are forced to have different median ranks, and as a result b will have a big sum of deviations. Thus, minimizing the sum of deviations may result in an undesirable scenario that big buckets are mostly decomposed into small ones.

In this paper, we propose a novel heuristic called Abnormal Rank Gap to capture the inter-bucket dissimilarity for better bucket forming. In addition, we propose to use the "closeness" on multiple quantile ranks to determine if two items should be put into the same bucket. We develop a novel bucket order discovering method termed the Bucket Gap algorithm. Our extensive experiments demonstrate that the Bucket Gap algorithm significantly outperforms the major related work, i.e., the Bucket Pivot algorithm. In particular, the error distance of the generated bucket order can be reduced by about 30% on a real paleontological dataset and the noise tolerance can be increased from 30% to 50% in the synthetic dataset.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
6
7
8
 
9
M. Fortelius, A. Gionis, J. Jernvall, and H. Mannila. Spectral ordering and biochronology of european fossil mammals. In Paleobiology, pages 206--214, 2006.
10
 
11
U. Halekoh and W. Vach. A bayesian approach to seriation problems in archeology. Computational Statistics and Data Analysis, 45(33):651--673, 2004.
 
12
 
13
I. Miklos, I. Somodi, and J. Podani. Rearrangement of ecological data matrices via markov chain monte carlo simulation. Ecology, 86:3398--3410, 2005.
 
14
K. Puolamäki, M. Fortelius, and H. Mannila. Seriation in paleontological data using markov chain monte carlo methods. PLoS Computational Bioglogy, 2(2), 2006.
 
15
 
16
E. Terzi and P. Tsaparas. Efficient algorithm for sequence segmentation. In SIAM SDM, 2006.
17
 
18


Collaborative Colleagues:
Jianlin Feng: colleagues
Qiong Fang: colleagues
Wilfred Ng: colleagues