| Visual analysis of documents with semantic graphs |
| Full text |
Pdf
(550 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration
table of contents
Paris, France
Pages 66-73
Year of Publication: 2009
ISBN:978-1-60558-670-0
|
|
Authors
|
|
Delia Rusu
|
Jožef Stefan Institute, Ljubljana, Slovenia
|
|
Blaž Fortuna
|
Jožef Stefan Institute, Ljubljana, Slovenia
|
|
Dunja Mladenić
|
Jožef Stefan Institute, Ljubljana, Slovenia
|
|
Marko Grobelnik
|
Jožef Stefan Institute, Ljubljana, Slovenia
|
|
Ruben Sipoš
|
Jožef Stefan Institute, Ljubljana, Slovenia
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 29, Downloads (12 Months): 60, Citation Count: 0
|
|
|
ABSTRACT
In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data analysis, data description and summarization. In order to derive the semantic graph, we take advantage of natural language processing, and carry out a series of operations comprising a pipeline, as follows. Firstly, named entities are identified and co-reference resolution is performed; moreover, pronominal anaphors are resolved for a subset of pronouns. Secondly, subject -- predicate -- object triplets are automatically extracted from the Penn Treebank parse tree obtained for each sentence in the document. The triplets are further enhanced by linking them to their corresponding co-referenced named entity, as well as attaching the associated WordNet synset, where available. Thus we obtain a semantic directed graph composed of connected triplets. The document's semantic graph is a starting point for automatically generating the document summary. The model for summary generation is obtained by machine learning, where the features are extracted from the semantic graph structure and content. The summary also has an associated semantic representation. The size of the semantic graph, as well as the summary length can be manually adjusted for an enhanced visual analysis. We also show how to employ the proposed technique for the Visual Analytics challenge.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Collins, C. 2006. DocuBurst: Document Content Visualization Using Language Structure. In Proceedings of IEEE Symposium on Information Visualization, Poster Session. Baltimore.
|
| |
2
|
|
| |
3
|
Fellbaum, Ch. 1998. WordNet: An Electronic Lexical Database. MIT Press.
|
| |
4
|
Fortuna, B., Grobelnik, M and Mladenić, D. 2005. Visualization of Text Document Corpus. Informatica Journal 29, pp. 270--277.
|
| |
5
|
Grobelnik, M. and Mladenić, D. 2004. Visualization of news articles. Informatica Journal 28, pp. 375--380.
|
| |
6
|
|
| |
7
|
Leskovec, J., Grobelnik, M. and Milic-Frayling, N. 2004. Learning Sub-structures of Document Semantic Graphs for Document Summarization. Workshop on Link Analysis and Group Detection (LinkKDD) at KDD 2004 (Seattle, USA, August 22--25, 2004).
|
| |
8
|
|
| |
9
|
Madnani, N., Zajic, D., Dorr, B., Ayan, N. F. and Lin, J. 2007. Multiple Alternative Sentence Compressions for Automatic Text Summarization. In Proceedings of the Document Understanding Conference (DUC).
|
| |
10
|
|
| |
11
|
Rusu, D., Dali, L., Fortuna, B., Grobelnik, M. and Mladenić, D. 2007. Triplet Extraction from Sentences. In Proceedings of the 10th International Multiconference "Information Society - IS 2007" (Ljubljana, Slovenia, October 8--12, 2007). 218--222.
|
| |
12
|
Rusu, D., Fortuna, B., Grobelnik, M. and Mladenić, D. 2009. Semantic Graphs Derived From Triplets With Application In Document Summarization. Informatica Journal.
|
| |
13
|
|
| |
14
|
Thai, V, Handschuh, S. and Decker, S. 2008. IVEA: An information visualization tool for personalized exploratory document collection analysis. In Proceedings of the European Semantic Web Conference (ESWC), pp. 139--153.
|
| |
15
|
Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H. and Vanderwende, L. 2007. The PYTHY Summarization System: Microsoft Research at DUC2007. In Proceedings of the Document Understanding Conference (DUC).
|
|