ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Visual analysis of documents with semantic graphs
Full text PdfPdf (550 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration table of contents
Paris, France
Pages: 66-73  
Year of Publication: 2009
ISBN:978-1-60558-670-0
Authors
Delia Rusu  Jožef Stefan Institute, Ljubljana, Slovenia
Blaž Fortuna  Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić  Jožef Stefan Institute, Ljubljana, Slovenia
Marko Grobelnik  Jožef Stefan Institute, Ljubljana, Slovenia
Ruben Sipoš  Jožef Stefan Institute, Ljubljana, Slovenia
Sponsors
: PASCAL2 - Pattern Analysis, Statistical Modelling and Computational Learning
: Helsinki Institute for Information Technology HIIT
: VisMaster, a European FP7 Coordination Action Project focused on Visual Analytics
: Danube University Krems, Departement of Information and Knowledge Engineering (DUK)
: National Visualization and Analytics Center (NVAC)
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 93,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1562849.1562857
What is a DOI?

ABSTRACT

In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data analysis, data description and summarization. In order to derive the semantic graph, we take advantage of natural language processing, and carry out a series of operations comprising a pipeline, as follows. Firstly, named entities are identified and co-reference resolution is performed; moreover, pronominal anaphors are resolved for a subset of pronouns. Secondly, subject -- predicate -- object triplets are automatically extracted from the Penn Treebank parse tree obtained for each sentence in the document. The triplets are further enhanced by linking them to their corresponding co-referenced named entity, as well as attaching the associated WordNet synset, where available. Thus we obtain a semantic directed graph composed of connected triplets. The document's semantic graph is a starting point for automatically generating the document summary. The model for summary generation is obtained by machine learning, where the features are extracted from the semantic graph structure and content. The summary also has an associated semantic representation. The size of the semantic graph, as well as the summary length can be manually adjusted for an enhanced visual analysis. We also show how to employ the proposed technique for the Visual Analytics challenge.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Collins, C. 2006. DocuBurst: Document Content Visualization Using Language Structure. In Proceedings of IEEE Symposium on Information Visualization, Poster Session. Baltimore.
 
2
 
3
Fellbaum, Ch. 1998. WordNet: An Electronic Lexical Database. MIT Press.
 
4
Fortuna, B., Grobelnik, M and Mladenić, D. 2005. Visualization of Text Document Corpus. Informatica Journal 29, pp. 270--277.
 
5
Grobelnik, M. and Mladenić, D. 2004. Visualization of news articles. Informatica Journal 28, pp. 375--380.
 
6
 
7
Leskovec, J., Grobelnik, M. and Milic-Frayling, N. 2004. Learning Sub-structures of Document Semantic Graphs for Document Summarization. Workshop on Link Analysis and Group Detection (LinkKDD) at KDD 2004 (Seattle, USA, August 22--25, 2004).
 
8
 
9
Madnani, N., Zajic, D., Dorr, B., Ayan, N. F. and Lin, J. 2007. Multiple Alternative Sentence Compressions for Automatic Text Summarization. In Proceedings of the Document Understanding Conference (DUC).
 
10
 
11
Rusu, D., Dali, L., Fortuna, B., Grobelnik, M. and Mladenić, D. 2007. Triplet Extraction from Sentences. In Proceedings of the 10th International Multiconference "Information Society - IS 2007" (Ljubljana, Slovenia, October 8--12, 2007). 218--222.
 
12
Rusu, D., Fortuna, B., Grobelnik, M. and Mladenić, D. 2009. Semantic Graphs Derived From Triplets With Application In Document Summarization. Informatica Journal.
 
13
 
14
Thai, V, Handschuh, S. and Decker, S. 2008. IVEA: An information visualization tool for personalized exploratory document collection analysis. In Proceedings of the European Semantic Web Conference (ESWC), pp. 139--153.
 
15
Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H. and Vanderwende, L. 2007. The PYTHY Summarization System: Microsoft Research at DUC2007. In Proceedings of the Document Understanding Conference (DUC).

Collaborative Colleagues:
Delia Rusu: colleagues
Blaž Fortuna: colleagues
Dunja Mladenić: colleagues
Marko Grobelnik: colleagues
Ruben Sipoš: colleagues