|
ABSTRACT
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Joseph M. Hellerstein , Elias Koutsoupias , Christos H. Papadimitriou, On the analysis of indexing schemes, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.249-256, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263688]
|
| |
9
|
E. Keogh. Exact indexing of dynamic time warping. In Proc. of VLDB, 2002.
|
 |
10
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
 |
11
|
|
| |
12
|
|
| |
13
|
D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proc. of FODO, 1998.
|
| |
14
|
C. Wang and X. S. Wang. Multilevel filtering for high dimensional nearest neighbor search. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
|
 |
15
|
Daniel Wu , Ambuj Singh , Divyakant Agrawal , Amr El Abbadi , Terence R. Smith, Efficient retrieval for browsing large image databases, Proceedings of the fifth international conference on Information and knowledge management, p.11-18, November 12-16, 1996, Rockville, Maryland, United States
[doi> 10.1145/238355.238365]
|
| |
16
|
P. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. of 3rd SIAM on Discrete Algorithms, 1992.
|
 |
17
|
|
CITED BY 31
|
|
|
|
|
|
|
|
Qiankun Zhao , Steven C. H. Hoi , Tie-Yan Liu , Sourav S. Bhowmick , Michael R. Lyu , Wei-Ying Ma, Time-dependent semantic similarity measure of queries using historical click-through data, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
Micah Dubinko , Ravi Kumar , Joseph Magnani , Jasmine Novak , Prabhakar Raghavan , Andrew Tomkins, Visualizing tags over time, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
Eytan Adar , Daniel S. Weld , Brian N. Bershad , Steven S. Gribble, Why we search: visualizing and predicting user behavior, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Wen Zhang , Jun Yan , Shuicheng Yan , Ning Liu , Zheng Chen, Temporal query substitution for ad search, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Gabriel Pui Cheong Fung , Jeffrey Xu Yu , Huan Liu , Philip S. Yu, Time-dependent event hierarchy construction, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Micah Dubinko , Ravi Kumar , Joseph Magnani , Jasmine Novak , Prabhakar Raghavan , Andrew Tomkins, Visualizing tags over time, ACM Transactions on the Web (TWEB), v.1 n.2, p.7-es, August 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marcel Karnstedt , Daniel Klan , Christian Pölitz , Kai-Uwe Sattler , Conny Franke, Adaptive burst detection in a stream engine, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|
|
|
|
|
Theodoros Lappas , Benjamin Arai , Manolis Platakis , Dimitrios Kotsakos , Dimitrios Gunopulos, On burstiness-aware search for document sequences, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|