|
ABSTRACT
We address the problem of academic conference homepage understanding for the Semantic Web. This problem consists of three labeling tasks - labeling conference function pages, function blocks, and attributes. Different from traditional information extraction tasks, the data in academic conference homepages has complex structural dependencies across multiple Web pages. In addition, there are logical constraints in the data. In this paper, we propose a unified approach, Constrained Hierarchical Conditional Random Fields, to accomplish the three labeling tasks simultaneously. In this approach, complex structural dependencies can be well described. Also, the constrained Viterbi algorithm in the inference process can avoid logical errors. Experimental results on real world conference data have demonstrated that this approach performs better than cascaded labeling methods by 3.6% in F1-measure and that the constrained inference process can improve the accuracy by 14.3%. Based on the proposed approach, we develop a prototype system of use-oriented semantic academic conference calendar. The user simply needs to specify what conferences he/she is interested in. Subsequently, the system finds, extracts, and updates the semantic information from the Web, and then builds a calendar automatically for the user. The semantic conference data can be used in other applications, such as finding sponsors and finding experts. The proposed approach can be used in other information extraction tasks as well.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Auer, S., Dietzold, S., and Riechert, T. OntoWiki - A Tool for Social, Semantic Collaboration. In Proc. of ISWC, 2006.
|
 |
2
|
|
| |
3
|
Ciravegna, F. (LP)2 An Adaptive Algorithm for Information Extraction from Web-related Texts. In Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, USA, 2001.
|
| |
4
|
Ciravegna, F., Dingli, A., Iria, J., and Wilks, Y. Multi-strategy Definition of Annotation Services in Melita, In Proc. of ISWC'2003 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003, 97--107.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Robert G. Cowell , Steffen L. Lauritzen , A. Philip David , David J. Spiegelhalter , V. Nair , J. Lawless , M. Jordan, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, 1999
|
| |
8
|
Cox, C., Nicolson, J., Finkel, J., and Manning, C. Template Sampling for Leveraging Domain Knowledge in Information Extraction. In PASCAL Challenges, 2005.
|
| |
9
|
Gandon, F., and Sadeh, N. A Semantic eWallet to Reconcile Privacy and Context Awareness. In Proc. of ISWC, 2003.
|
| |
10
|
|
| |
11
|
Hammersley, J. and Clifford, P. Markov fields on Finite Graphs and Lattices. 1971.
|
| |
12
|
He, X., Zemel, R., and Carreira-Perpiñán, M. Multiscale Conditional Random Fields for Image Labeling. In Proc of CVPR, 2004, 695--702.
|
 |
13
|
Neil Ireson , Fabio Ciravegna , Mary Elaine Califf , Dayne Freitag , Nicholas Kushmerick , Alberto Lavelli, Evaluating machine learning for information extraction, Proceedings of the 22nd international conference on Machine learning, p.345-352, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102395]
|
| |
14
|
Kristjansson, T., Culotta, A., Viola, P., and McCallum, A. Interactive Information Extraction with Constrained Condition Random Fields. In Proc of AAAI, 2004, 412--418.
|
| |
15
|
|
| |
16
|
Lazarinis, F. Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers. In Proc. of IRSG, 1998.
|
| |
17
|
Li, Y., Bontcheva, K., and Cunningham, H. Using Uneven Margins SVM and Perceptron for Information Extraction. In Proc. of CoNLL, 2005.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Sarawagi, S. and Cohen, W. Semi-markov Conditional Random Fields for Information Extraction. In Proc. of NIPS, 2004.
|
| |
24
|
Schneider, K. Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features. In Proc. of AICS, 2005, 267--276.
|
| |
25
|
|
| |
26
|
Tang, J., Hong, M., Li, J., and Liang, B. Tree-structured Conditional Random Fields for Semantic Annotation. In Proc. of ISWC, 2006, 640--653.
|
| |
27
|
Yedidia, J., Freeman, W., and Weiss, Y. Generalized Belief Propagation. In Proc. of NIPS, 2000.
|
 |
28
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, 2D Conditional Random Fields for Web information extraction, Proceedings of the 22nd international conference on Machine learning, p.1044-1051, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102483]
|
 |
29
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, Simultaneous record detection and attribute labeling in web data extraction, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150457]
|
|