|
ABSTRACT
To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English--Chinese cross-language Web search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ahonen, H., Heinonen, O., Klemettinen, M., and Verkamo, A. 1999. Finding co-occurring text phrases by combining sequence and frequent set discovery. In Proceedings of IJCAI'99 Workshop on Text Mining: Foundations, Techniques and Applications, 1--9.
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
9
|
Chen, K. H. and Chen, H. H. 2001. The Chinese text retrieval tasks of NTCIR workshop 2, In Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization.
|
 |
10
|
|
| |
11
|
Dagan, I., Church, K. W., and Gale, W. A. 1993. Robust bilingual word alignment for machine aided translation. In Proceedings of the Workshop on Very Large Corpora, 1--8.
|
| |
12
|
Deogun, J. S., Raghavan V. V., and Server, H. 1997. Data mining: Research trends, challenges, and applications. In Rough Sets and Data Mining: Analysis of Imprecise Data. Kluwer Academic Publishers, 9--45.
|
| |
13
|
Dumais, S. T., Landauer, T. K., and Littman, M. L. 1996. Automatic cross-linguistic information retrieval using latent semantic indexing. In Proceedings of ACM-SIGIR'96 Workshop on Cross-Linguistic Information Retrieval. ACM, New York, 16--24.
|
| |
14
|
Dumais, S. T., Letsche, A., Littman, M. L., and Landauer, T. K. 1997. Automatic Cross-Linguistic Retrieval Using Latent Semantic Indexing. In AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 15--21.
|
| |
15
|
|
| |
16
|
Feldman, R. and Dagan, I. 1995. KDT---Knowledge discovery in texts. In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, 112--117.
|
| |
17
|
Feldman, R., Aumann, Y., Amir, A., Kloesgen, W., and Zilberstien, A. 1997. Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 167--170.
|
| |
18
|
|
| |
19
|
|
 |
20
|
G. W. Furnas , S. Deerwester , S. T. Dumais , T. K. Landauer , R. A. Harshman , L. A. Streeter , K. E. Lochbaum, Information retrieval using a singular value decomposition model of latent semantic structure, Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, p.465-480, May 1988, Grenoble, France
[doi> 10.1145/62437.62487]
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Knight, K. 1997. Automating knowledge acquisition for machine translation. AI Mag. 18, 4.
|
| |
27
|
Kwok, K. L. 2001. NTCIR-2 Chinese, cross language retrieval experiments using PIRCS. In Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization.
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
Mori, T., Kokubu, T., and Tanaka, T. 2001. Cross-lingual information retrieval based on LSI with multiple word spaces. In Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization.
|
 |
34
|
Jian-Yun Nie , Michel Simard , Pierre Isabelle , Richard Durand, Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.74-81, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312656]
|
| |
35
|
Oard, D. and Diekema, A. 1998. Cross-language information retrieval. Ann. Rev. Inf. Sci. Tech. 33, 223--256.
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
Silverstein, C., Henzinger, M., Marais, J., and Moricz, M. 1998. Analysis of a very large altavista query log. Tech. Rep. 1998--014. Digital Systems Research Center.
|
| |
40
|
Simard, M. 2000. Multilingual text alignment. In Parallel Text Processing, J. Veronis, Eds. Kluwer Academic Publishers, The Netherlands, 49--67.
|
| |
41
|
|
| |
42
|
Soderland, S. 1997. Learning to extract text-based information from the world wide web. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 251--254.
|
 |
43
|
|
 |
44
|
|
| |
45
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
Pu-Jen Cheng , Jei-Wen Teng , Ruei-Cheng Chen , Jenq-Haur Wang , Wen-Hsiang Lu , Lee-Feng Chien, Translating unknown queries with web corpora for cross-language information retrieval, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
Pu-Jen Cheng , Yi-Cheng Pan , Wen-Hsiang Lu , Lee-Feng Chien, Creating multilingual translation lexicons with regional variations using web corpora, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.534-es, July 21-26, 2004, Barcelona, Spain
|
|
|
Qing Li , Yuanzhu Peter Chen , Sung-Hyon Myaeng , Yun Jin , Bo-Yeong Kang, Concept unification of terms in different languages via web mining for Information Retrieval, Information Processing and Management: an International Journal, v.45 n.2, p.246-262, March, 2009
|
|
Jenq-Haur Wang , Jei-Wen Teng , Pu-Jen Cheng , Wen-Hsiang Lu , Lee-Feng Chien, Translating unknown cross-lingual queries in digital libraries using a web-based approach, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
|
|
Sheila Kinsella , Adriana Budura , Gleb Skobeltsyn , Sebastian Michel , John G. Breslin , Karl Aberer, From Web 1.0 to Web 2.0 and back -: how did your grandma use to tag?, Proceeding of the 10th ACM workshop on Web information and data management, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Open signaling for ATM, internet and mobile networks (OPENSIG'98)
ACM SIGCOMM Computer Communication Review
29, 1
Andrew T. Campbell
, Irene Katzela
, Kazuho Miki
, John Vicente
-
Constructing reality
Proceedings of the 11th annual international conference on Systems documentation
Douglas A. Powell
, Norman R. Ball
, Mansel W. Griffiths
-
Active bridging
ACM SIGCOMM Computer Communication Review
27, 4
D. Scott Alexander
, Marianne Shaw
, Scott M. Nettles
, Jonathan M. Smith
-
Active electronic mail
Proceedings of the 2002 ACM symposium on Applied computing
S. Karnouskos
, A. Vasilakos
-
Object-oriented database management system for process control systems—development and evaluation
Proceedings of the 1999 ACM symposium on Applied computing
Ryuji Wakizono
, Toshikazu Kawamura
, Takehiko Tsuchiya
, Takahiro Hatanaka
, Tatsuji Tanaka
|