ACM Home Page
Please provide us with feedback. Feedback
Extracting discriminative concepts for domain adaptation in text mining
Full text MovMov (14:15),  PdfPdf (469 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 179-188  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Bo Chen  The Chinese Unversity of Hong kong, Hong Kong, Hong Kong
Wai Lam  The Chinese Unversity of Hong kong, Hong Kong, Hong Kong
Ivor Tsang  Nanyang Technological University, Singapore, Singapore
Tak-Lam Wong  The Chinese Unverisity of Hong Kong, Hong Kong, Hong Kong
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 216,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557045
What is a DOI?

ABSTRACT

One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Bickel, C. Sawade, and T. Scheffer. Transfer learning by distribution matching for targeted advertising. In Advances in Neural Information Processing Systems 21, pages 145--152, 2009.
 
3
 
4
H. Daume III. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 256--263, June 2007.
 
5
A. Gretton, K. Borgwardt, M. Rasch, B. Scholkolpf, and A. Smola. A kernel method for the two-sample problem. In Advances in Neural Information Processing Systems 19, pages 513--520, 2007.
 
6
J. Huang, A. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems 19, pages 601--608, 2007.
7
 
8
J. Jiang and C. Zhai. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 264--271, 2007.
 
9
10
 
11
S. J. Pan, J. T. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI conference on Artifcial Intelligence, pages 677--682, 2008.
12
 
13
 
14
A. Storkey and M. Sugiyama. Mixture regression for covariate shift. In Advances in Neural Information Processing Systems 19, pages 1337--1344, 2007.
 
15
M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 20, pages 1433--1440, 2008.
16


Collaborative Colleagues:
Bo Chen: colleagues
Wai Lam: colleagues
Ivor Tsang: colleagues
Tak-Lam Wong: colleagues