| Machine learning for information architecture in a large governmental website |
| Full text |
Pdf
(1.49 MB)
|
| Source
|
International Conference on Digital Libraries
archive
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
table of contents
Tuscon, AZ, USA
SESSION: Automated techniques for managing collections
table of contents
Pages: 151 - 159
Year of Publication: 2004
ISBN:1-58113-832-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 32, Citation Count: 3
|
|
|
ABSTRACT
This paper describes ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a practical aid for information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agresti. An Introduction to Categorical Data Analysis Wiley, New York, 1996.
|
 |
2
|
Christopher Ahlberg , Christopher Williamson , Ben Shneiderman, Dynamic queries for information exploration: an implementation and evaluation, Proceedings of the SIGCHI conference on Human factors in computing systems, p.619-626, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.143054]
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
M. Efron, G. Marchionini, and J. Zhang. Implications of the recursive representation problem for automatic concept identification in on-line governmental information In Proceedings of the ASIST Special Interest Group on Classification Research (ASIST SIG-CR), 2003.
|
| |
7
|
C. Fraley and A. E. Raftery How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8):578--588, 1998.
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
I. T. Jolliffe Principal Component Analysis Springer, 2nd edition, 2002.
|
| |
12
|
L. Kaufman and P. J. Rosseeuw Finding Groups in Data: an Introduction to Cluster Analysis Wiley, 1990.
|
| |
13
|
G. Marchionini and B. Brunk Toward a general relation browser: a GUI for information architects Journal of Digital Information, 4(1), 2003 http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Marchionini/
|
| |
14
|
A. K. McCallum Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering http://www cs cmu edu/mccallum/bow, 1996.
|
| |
15
|
|
| |
16
|
|
| |
17
|
R. Tibshirani, G. Walther, and T. Hastie Estimating the number of clusters in a dataset via the gap statistic, 2000 http://citeseer.nj.nec.com/tibshirani00estimating.html
|
| |
18
|
|
CITED BY 3
|
|
|
|
|
|
|
|
Robert Capra , Gary Marchionini , Jung Sun Oh , Fred Stutzman , Yan Zhang, Effects of structure and interaction style on distinct search tasks, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|