|
ABSTRACT
We describe the Purple SOX (PSOX) EMS, a prototype Extraction Management System currently being built at Yahoo!. The goal of the PSOX EMS is to manage a large number of sophisticated extraction pipelines across different application domains, at the web scale and with minimum human involvement. Three key value propositions are described: extensibility, the ability to swap in and out extraction operators; explainability, the ability to track the provenance of extraction results; and social feedback support, the facility for gathering and reconciling multiple, potentially conflicting sources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In ICDE, 2008.
|
| |
4
|
S. Cohen, S. Boulakia, and S. Davidson. Towards a model of provenance and user views in scientific workflows. In DILS, 2006.
|
| |
5
|
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.
|
| |
6
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
 |
7
|
|
| |
8
|
|
| |
9
|
F. Manola and E. Miller. RDF Primer W3C Recommendation, 2004.
|
| |
10
|
|
 |
11
|
|
CITED BY
|
|
Xiaoyong Chai , Ba-Quy Vuong , AnHai Doan , Jeffrey F. Naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|