| Discovering interesting usage patterns in text collections: integrating text mining with visualization |
| Full text |
Pdf
(580 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
table of contents
Lisbon, Portugal
SESSION: Explanation, knowledge provenance and synthesis (KM)
table of contents
Pages: 213-222
Year of Publication: 2007
ISBN:978-1-59593-803-9
|
|
Authors
|
|
Anthony Don
|
University of Maryland, College Park, MD
|
|
Elena Zheleva
|
University of Maryland, College Park, MD
|
|
Machon Gregory
|
University of Maryland, College Park, MD
|
|
Sureyya Tarkan
|
University of Maryland, College Park, MD
|
|
Loretta Auvil
|
University of Illinois, Urbana, IL
|
|
Tanya Clement
|
University of Maryland, College Park
|
|
Ben Shneiderman
|
University of Maryland, College Park, MD
|
|
Catherine Plaisant
|
University of Maryland, College Park, MD
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 35, Downloads (12 Months): 210, Citation Count: 4
|
|
|
ABSTRACT
This paper addresses the problem of making text mining results more comprehensible to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections. Our system, FeatureLens1, visualizes a text collection at several levels of granularity and enables users to explore interesting text patterns. The current implementation focuses on frequent itemsets of n-grams, as they capture the repetition of exact or similar expressions in the collection. Users can find meaningful co-occurrences of text patterns by visualizing them within and across documents in the collection. This also permits users to identify the temporal evolution of usage such as increasing, decreasing or sudden appearance of text patterns. The interface could be used to explore other text features as well. Initial studies suggest that FeatureLens helped a literary scholar and 8 users generate new hypotheses and interesting insights using 2 text collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Church, K. W., and Helfman, J. I., Dotplot: A Program for Exploring Self-Similarity in Millions of Lines of Text and Code, In Proc. of the 24th Symposium on the Interface, Computing Science and Statistics V24, 58--67. 1992.
|
| |
3
|
|
 |
4
|
|
| |
5
|
Frank, A. C., Amiri, H., Andersson, S., Genome Deterioration: loss of repeated sequences and accumulation of junk DNA. Genetica, Vol. 115, No. 1, 1--12. 2002.
|
| |
6
|
Kurtz, S & Schleiermacher, C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426--427. 1999.
|
| |
7
|
|
| |
8
|
NY Times: The State of the Union in Words. http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html
|
| |
9
|
Paley, W. B. TextArc: Showing Word Frequency and Distribution in Text. Poster presented at IEEE Symposium on Information Visualization. 2002.
|
| |
10
|
J. Pei and J. Han and R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, ACM SIGMOD, Workshop on Research Issues in Data Mining and Knowledge Discovery, 21--30. 2000.
|
 |
11
|
Catherine Plaisant , James Rose , Bei Yu , Loretta Auvil , Matthew G. Kirschenbaum , Martha Nell Smith , Tanya Clement , Greg Lord, Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
[doi> 10.1145/1141753.1141781]
|
| |
12
|
Data to Knowledge (D2K) and Text to knowledge (T2K), NCSA. http://alg.ncsa.uiuc.edu/do/tools.
|
| |
13
|
Thomas, J. J. and Cook, K. A. (eds.), Illuminating the Path: Research and Development Agenda for Visual Analytics, IEEE. 2005.
|
 |
14
|
|
| |
15
|
|
| |
16
|
J. A. Wise , J. J. Thomas , K. Pennock , D. Lantrip , M. Pottier , A. Schur , V. Crow, Visualizing the non-visual: spatial analysis and interaction with information from text documents, Proceedings of the 1995 IEEE Symposium on Information Visualization, p.51, October 30-31, 1995, Atlanta, Georgia
|
CITED BY 5
|
|
|
|
|
Ying Liu , Lucian V. Lita , R. Stefan Niculescu , Kun Bai , Prasenjit Mitra , C. Lee Giles, Real-time data pre-processing technique for efficient feature extraction in large scale datasets, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Monica Berti , Matteo Romanello , Alison Babeu , Gregory Crane, Collecting fragmentary authors in a digital library, Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, June 15-19, 2009, Austin, TX, USA
|
|
|
Shixia Liu , Michelle X. Zhou , Shimei Pan , Weihong Qian , Weijia Cai , Xiaoxiao Lian, Interactive, topic-based visual text summarization and analysis, Proceeding of the 18th ACM conference on Information and knowledge management, November 02-06, 2009, Hong Kong, China
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Graphical user interfaces (GUI)
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.8
Database applications
Subjects:
Data mining
General Terms:
Algorithms,
Design,
Experimentation,
Human Factors,
Measurement
Keywords:
digital humanities,
frequent closed itemsets,
n-grams,
text mining,
user interface
|