ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Discovering interesting usage patterns in text collections: integrating text mining with visualization
Full text PdfPdf (580 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management table of contents
Lisbon, Portugal
SESSION: Explanation, knowledge provenance and synthesis (KM) table of contents
Pages: 213-222  
Year of Publication: 2007
ISBN:978-1-59593-803-9
Authors
Anthony Don  University of Maryland, College Park, MD
Elena Zheleva  University of Maryland, College Park, MD
Machon Gregory  University of Maryland, College Park, MD
Sureyya Tarkan  University of Maryland, College Park, MD
Loretta Auvil  University of Illinois, Urbana, IL
Tanya Clement  University of Maryland, College Park
Ben Shneiderman  University of Maryland, College Park, MD
Catherine Plaisant  University of Maryland, College Park, MD
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 210,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321440.1321473
What is a DOI?

ABSTRACT

This paper addresses the problem of making text mining results more comprehensible to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections. Our system, FeatureLens1, visualizes a text collection at several levels of granularity and enables users to explore interesting text patterns. The current implementation focuses on frequent itemsets of n-grams, as they capture the repetition of exact or similar expressions in the collection. Users can find meaningful co-occurrences of text patterns by visualizing them within and across documents in the collection. This also permits users to identify the temporal evolution of usage such as increasing, decreasing or sudden appearance of text patterns. The interface could be used to explore other text features as well. Initial studies suggest that FeatureLens helped a literary scholar and 8 users generate new hypotheses and interesting insights using 2 text collections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Church, K. W., and Helfman, J. I., Dotplot: A Program for Exploring Self-Similarity in Millions of Lines of Text and Code, In Proc. of the 24th Symposium on the Interface, Computing Science and Statistics V24, 58--67. 1992.
 
3
4
 
5
Frank, A. C., Amiri, H., Andersson, S., Genome Deterioration: loss of repeated sequences and accumulation of junk DNA. Genetica, Vol. 115, No. 1, 1--12. 2002.
 
6
Kurtz, S & Schleiermacher, C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426--427. 1999.
 
7
 
8
NY Times: The State of the Union in Words. http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html
 
9
Paley, W. B. TextArc: Showing Word Frequency and Distribution in Text. Poster presented at IEEE Symposium on Information Visualization. 2002.
 
10
J. Pei and J. Han and R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, ACM SIGMOD, Workshop on Research Issues in Data Mining and Knowledge Discovery, 21--30. 2000.
11
 
12
Data to Knowledge (D2K) and Text to knowledge (T2K), NCSA. http://alg.ncsa.uiuc.edu/do/tools.
 
13
Thomas, J. J. and Cook, K. A. (eds.), Illuminating the Path: Research and Development Agenda for Visual Analytics, IEEE. 2005.
14
 
15
 
16


Collaborative Colleagues:
Anthony Don: colleagues
Elena Zheleva: colleagues
Machon Gregory: colleagues
Sureyya Tarkan: colleagues
Loretta Auvil: colleagues
Tanya Clement: colleagues
Ben Shneiderman: colleagues
Catherine Plaisant: colleagues