ACM Home Page
Please provide us with feedback. Feedback
Identifying similarities, periodicities and bursts for online search queries
Full text PdfPdf (682 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: data mining applications table of contents
Pages: 131 - 142  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Michail Vlachos  UC Riverside
Christopher Meek
Zografoula Vagena  UC Riverside
Dimitrios Gunopulos  UC Riverside
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 121,   Citation Count: 31
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007586
What is a DOI?

ABSTRACT

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
 
6
 
7
8
 
9
E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002.
10
11
 
12
 
13
D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.
 
14
C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
15
 
16
P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.
17

CITED BY  31
Collaborative Colleagues:
Michail Vlachos: colleagues
Christopher Meek: colleagues
Zografoula Vagena: colleagues
Dimitrios Gunopulos: colleagues