ACM Home Page
Please provide us with feedback. Feedback
Purple SOX extraction management system
Full text PdfPdf (711 KB)
Source
ACM SIGMOD Record archive
Volume 37 ,  Issue 4  (December 2008) table of contents
COLUMN: Special section on managing information extraction table of contents
Pages 21-27  
Year of Publication: 2009
ISSN:0163-5808
Authors
Philip Bohannon  Yahoo! Research
Srujana Merugu  Yahoo! Research
Cong Yu  Yahoo! Research
Vipul Agarwal  Yahoo! Research
Pedro DeRose  University of Wisconsin Madison
Arun Iyer  Yahoo! Research
Ankur Jain  Yahoo! Research
Vinay Kakade  Yahoo! Research
Mridul Muralidharan  Yahoo! Research
Raghu Ramakrishnan  Yahoo! Research
Warren Shen  Yahoo! Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 62,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1519103.1519107
What is a DOI?

ABSTRACT

We describe the Purple SOX (PSOX) EMS, a prototype Extraction Management System currently being built at Yahoo!. The goal of the PSOX EMS is to manage a large number of sophisticated extraction pipelines across different application domains, at the web scale and with minimum human involvement. Three key value propositions are described: extensibility, the ability to swap in and out extraction operators; explainability, the ability to track the provenance of extraction results; and social feedback support, the facility for gathering and reconciling multiple, potentially conflicting sources.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In ICDE, 2008.
 
4
S. Cohen, S. Boulakia, and S. Davidson. Towards a model of provenance and user views in scientific workflows. In DILS, 2006.
 
5
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.
 
6
7
 
8
 
9
F. Manola and E. Miller. RDF Primer W3C Recommendation, 2004.
 
10
11


Collaborative Colleagues:
Philip Bohannon: colleagues
Srujana Merugu: colleagues
Cong Yu: colleagues
Vipul Agarwal: colleagues
Pedro DeRose: colleagues
Arun Iyer: colleagues
Ankur Jain: colleagues
Vinay Kakade: colleagues
Mridul Muralidharan: colleagues
Raghu Ramakrishnan: colleagues
Warren Shen: colleagues