ACM Home Page
Please provide us with feedback. Feedback
Shallow NLP techniques for internet search
Full text PdfPdf (422 KB)
Source ACM International Conference Proceeding Series; Vol. 171 archive
Proceedings of the 29th Australasian Computer Science Conference - Volume 48 table of contents
Hobart, Australia
Pages: 167 - 176  
Year of Publication: 2006
ISBN ~ ISSN:1445-1336 , 1-920682-30-9
Authors
Alex Penev  National ICT Australia and School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
Raymond Wong  National ICT Australia and School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
Publisher
Australian Computer Society, Inc.  Darlinghurst, Australia, Australia
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 74,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Information Retrieval (IR) is a major component in many of our daily activities, with perhaps its most prominent role manifested in search engines. Today's most advanced engines use the keyword-based ("bag of words") paradigm, which concedes some inherent disadvantages. We believe that natural language (NL) is a more user-oriented, context-preservative and intuitive mechanism for web search.In this paper, we explore shallow NLP techniques to support a range of NL queries over an existing keyword-based engine. We present JASE, a web application enveloping the Google search engine, which performs web searches by decomposing input NL queries and generating new queries that are more suitable for the search engine. By using some of Google's syntactic operators and filters, it creates "clever" queries to improve precision.A preliminary evaluation was conducted to test JASE's accuracy, and results have been encouraging. We conclude that the NL model has potential to not only rival the keyword-based paradigm, but substantially surpass it.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
Charniak, E. (1994), Statistical language learning, in 'Language and Computers 12', The MIT Press.
 
6
Charniak, E. (1997), 'Statistical techniques for natural language parsing', AI Magazine 18(4), 33-44.
 
7
 
8
 
9
 
10
Jones, K. S. (1972), 'A statistical interpretation of term specificity and its application to retrieval', Journal of Documentation 28(1), 11-21.
 
11
 
12
13
14
 
15
Luhn, H. P. (1957), 'A statistical approach to mechanized encoding and searching of literary information', IBM Journal of Research and Development, 4(4), 600-605.
 
16
Munoz, A. (1996), Compound key word generation from document databases using a hierarchical clustering ART model. IDA, Amsterdam.
 
17
Page, L., Brin, S., Motwani, R. & Winograd, T. (1998), 'The pagerank citation ranking: Bringing order to the web', Stanford Digital Library Technologies Project.
 
18
Porter, M. (1980), An algorithm for suffix stripping, in 'Program', Vol. 14, pp. 130-137.
19
 
20
 
21
 
22
 
23
Turney, P. (1999), 'Learning to extract keyphrases from text', Technical Report ERB-1057, National Research Council, Institute for Information Technology.
 
24

Collaborative Colleagues:
Alex Penev: colleagues
Raymond Wong: colleagues