ACM Home Page
Please provide us with feedback. Feedback
Graph-based text representation and knowledge discovery
Full text PdfPdf (134 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2007 ACM symposium on Applied computing table of contents
Seoul, Korea
SESSION: Information access and retrieval table of contents
Pages: 807 - 811  
Year of Publication: 2007
ISBN:1-59593-480-4
Authors
Wei Jin  State University of New York at Buffalo, NY
Rohini K. Srihari  State University of New York at Buffalo, NY
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 136,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244002.1244182
What is a DOI?

ABSTRACT

For information retrieval and text-mining, a robust scalable framework is required to represent the information extracted from documents and enable visualization and query of such information. One very widely used model is the vector space model which is based on the bag-of-words approach. However, it suffers from the fact that it loses important information about the original text, such as information about the order of the terms in the text or about the frontiers between sentences or paragraphs. In this paper, we propose a graph-based text representation, which is capable of capturing (i) Term order (ii) Term frequency (iii) Term co-occurrence (iv) Term context in documents. We also apply the graph model into our text mining task, which is to discover unapparent associations between two and more concepts (e.g. individuals) from a large text corpus. Counterterrorism corpus is used to evaluate the performance of various retrieval models, which demonstrates feasibility and effectiveness of graphic text representation in information retrieval and text mining.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Hensman, S. Construction of conceptual graph representation of texts. In Proceedings of Student Research Workshop at HLT-NAACL, Boston, 2004, 49--54.
 
4
Bhoopesh, P. Text clustering using semantics. The 11<sup>th</sup> International Word Wide Web Conference, (WWW 2002), Hawai, USA, 2002.
 
5
Mani, I., Bloedorn, E. Multi-document summarization by graph search and matching. In Proceedings of Fifteenth National Conference Artificial Intelligence, 1997, 622--628.
 
6
 
7
 
8
Srihari, R. K., Li, W., Niu, C. and Cornell, T. Infoxtract: A customizable intermediate level information extraction engine. Natural Language Engineering, 12 (4): 1--37, 2006.
 
9

Collaborative Colleagues:
Wei Jin: colleagues
Rohini K. Srihari: colleagues