ACM Home Page
Please provide us with feedback. Feedback
Information extraction challenges in managing unstructured data
Full text PdfPdf (338 KB)
Source
ACM SIGMOD Record archive
Volume 37 ,  Issue 4  (December 2008) table of contents
COLUMN: Special section on managing information extraction table of contents
Pages 14-20  
Year of Publication: 2009
ISSN:0163-5808
Authors
AnHai Doan  University of Wisconsin-Madison
Jeffrey F. Naughton  University of Wisconsin-Madison
Raghu Ramakrishnan  University of Wisconsin-Madison
Akanksha Baid  University of Wisconsin-Madison
Xiaoyong Chai  University of Wisconsin-Madison
Fei Chen  University of Wisconsin-Madison
Ting Chen  University of Wisconsin-Madison
Eric Chu  University of Wisconsin-Madison
Pedro DeRose  University of Wisconsin-Madison
Byron Gao  University of Wisconsin-Madison
Chaitanya Gokhale  University of Wisconsin-Madison
Jiansheng Huang  University of Wisconsin-Madison
Warren Shen  University of Wisconsin-Madison
Ba-Quy Vuong  University of Wisconsin-Madison
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 54,   Downloads (12 Months): 204,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1519103.1519106
What is a DOI?

ABSTRACT

Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
X. Chai, B. Vuong, A. Doan, and J. F. Naughton. Efficiently incorporating user interaction into extraction and integration programs. Technical Report UW-CSE-2008, University of Wisconsin-Madison, 2008.
 
5
F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In ICDE, 2008.
 
6
 
7
P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, and X. Zhu. Building community wikipedias: A machine-human partnership approach. In ICDE, 2008.
 
8
 
9
P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, and R. Ramakrishnan. Dblife: A community information management platform for the database research community (demo). In CIDR, 2007.
 
10
A. Doan. Data integration research challenges in community information management systems, 2008. Keynote talk, Workshop on Information Integration Methods, Architectures, and Systems (IIMAS) at ICDE-08.
 
11
A. Doan, P. Bohannon, R. Ramakrishnan, X. Chai, P. DeRose, B. Gao, and W. Shen. User-centric research challenges in community information management systems. IEEE Data Engineering Bulletin, 30(2):32--40, 2007.
 
12
A. Doan, J. F. Naughton, A. Baid, X. Chai, F. Chen, T. Chen, E. Chu, P. DeRose, B. Gao, C. Gokhale, J. Huang, W. Shen, and B. Vuong. The case for a structured approach to managing unstructured data. In CIDR, 2009.
 
13
A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, 29(1):64--72, 2006.
14
 
15
16
17
 
18
 
19
W. Shen, C. Gokhale, J. Patel, A. Doan, and J. F. Naughton. Relational databases for information extraction: Limitations and opportunities. Technical Report UW-CSE-2008, University of Wisconsin-Madison, 2008.
 
20
W. C. Tan. Provenance in databases: Past, current, and future. IEEE Data Eng. Bull., 30(4):3--12, 2007.


Collaborative Colleagues:
AnHai Doan: colleagues
Jeffrey F. Naughton: colleagues
Raghu Ramakrishnan: colleagues
Akanksha Baid: colleagues
Xiaoyong Chai: colleagues
Fei Chen: colleagues
Ting Chen: colleagues
Eric Chu: colleagues
Pedro DeRose: colleagues
Byron Gao: colleagues
Chaitanya Gokhale: colleagues
Jiansheng Huang: colleagues
Warren Shen: colleagues
Ba-Quy Vuong: colleagues