ACM Home Page
Please provide us with feedback. Feedback
The infocious web search engine: improving web searching through linguistic analysis
Full text PdfPdf (228 KB)
Source International World Wide Web Conference archive
Special interest tracks and posters of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Industrial and practical experience track paper session 2 table of contents
Pages: 840 - 849  
Year of Publication: 2005
ISBN:1-59593-051-5
Authors
Alexandros Ntoulas  Infocious Inc.
Gerald Chao  Infocious Inc.
Junghoo Cho  University of California at Los Angeles
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 57,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062745.1062765
What is a DOI?

ABSTRACT

In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities present in natural language text. This is achieved by performing linguistic analysis on the content of the Web pages we index, which is a departure from existing Web search engines that return results mainly based on keyword matching. This additional step of linguistic processing gives Infocious two main advantages. First, Infocious gains a deeper understanding of the content of Web pages so it can better match users' queries with indexed documents and therefore can improve relevancy of the returned results. Second, based on its linguistic processing, Infocious can organize and present the results to the user in more intuitive ways. In this paper we present the linguistic processing technologies that we incorporated in Infocious and how they are applied in helping users find information on the Web more efficiently. We discuss the various components in the architecture of Infocious and how each of them benefits from the added linguistic processing. Finally, we experimentally evaluate the performance of a component which leverages linguistic information in order to categorize Web pages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Altavista Inc. http://www.altavista.com.
 
3
Ask Jeeves Inc. http://www.ask.com.
 
4
Autonomy Inc. http://www.autonomy.com.
 
5
 
6
Brainboost. http://www.brainboost.com.
 
7
 
8
 
9
 
10
C. Chekuri, M. Goldwasser, P. Raghavan, and E. Upfal. Web search using automatic classification. In Proceedings of WWW-96, 6th International Conference on the World Wide Web, San Jose, US, 1996.
 
11
 
12
13
 
14
J. Cho and A. Ntoulas. Effective change detection using sampling. In Proceedings of the Twenty-eighth International Conference on Very Large Databases (VLDB), August 2002.
 
15
 
16
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, September 1990.
 
17
The open directory project. http://www.dmoz.org.
 
18
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
 
19
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text Retrieval Conference (TREC-2), 1994.
 
20
Excite Inc. http://www.excite.com.
 
21
Google Incorporated. http://www.google.com.
22
 
23
Infocious Incorporated. http://www.infocious.com.
 
24
Inquira Inc. http://www.inquira.com.
 
25
Inxight Inc. http://www.inxight.com.
 
26
iPhrase Inc. http://www.iphrase.com.
 
27
 
28
B. Katz, J. Lin, D. Loreto, W. Hildebrandt, M. Bilotti, S. Felshin, A. Fernandes, G. Marton, and F. Mora. Integrating web-based and corpus-based techniques for question answering, November 2003.
 
29
C. Li, J.-R. Wen, and H. Li. Text classification using stochastic keyword generation. In Twentieth International Conference on Machine Learning (ICML), pages 464--471, 2003.
 
30
Lycos Inc. http://www.lycos.com.
 
31
 
32
O. A. McBryan. GENVL and WWWW: Tools for taming the web. In First International Conference on the World Wide Web, CERN, Geneva, Switzerland, May 1994.
 
33
R. Mihalcea. Bootstrapping large sense tagged corpora. In Proceedings of the 3rd International Conference on Language Resources and Evaluations (LREC), Las Palmas, Spain, May 2002.
 
34
MSNSearch. http://www.msnsearch.com.
 
35
 
36
A. Ntoulas, P. Zerfos, and J. Cho. Downloading hidden web content. Technical report, UCLA, 2004. Available at http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_hidden_web_extended.pdf.
 
37
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Database Group, Computer Science Department, Stanford University, November 1999. http://dbpubs.stanford.edu/pub/1999-66.
 
38
A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the First Conference on Empirical Methods in Natural Language Processing, pages 133--142, 1996.
 
39
 
40
START natural language question answering system. http://www.ai.mit.edu/projects/infolab/.
 
41
Teoma. http://www.teoma.com.
 
42
 
43
 
44
A. J. Viterbi. Error bounds for convolutional codes and an asymtotically optimum decoding algorithm. IEEE Transactions on Information Theory, IT-13:260--267, 1967.
 
45
 
46
Yahoo! Inc. http://www.yahoo.com.


Collaborative Colleagues:
Alexandros Ntoulas: colleagues
Gerald Chao: colleagues
Junghoo Cho: colleagues