| Peta-scale data warehousing at Yahoo! |
| Full text |
Pdf
(824 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 35th SIGMOD international conference on Management of data
table of contents
Providence, Rhode Island, USA
SESSION: Industrial session 1: data warehousing
table of contents
Pages 855-862
Year of Publication: 2009
ISBN:978-1-60558-551-2
|
|
Authors
|
|
Mona Ahuja
|
Yahoo!, Bellevue, WA, USA
|
|
Cheng Che Chen
|
Yahoo!, Bellevue, WA, USA
|
|
Ravi Gottapu
|
Yahoo!, Bellevue, WA, USA
|
|
Jörg Hallmann
|
Yahoo!, Bellevue, WA, USA
|
|
Waqar Hasan
|
Yahoo!, Bellevue, WA, USA
|
|
Richard Johnson
|
Yahoo!, Bellevue, WA, USA
|
|
Maciek Kozyrczak
|
Yahoo!, Bellevue, WA, USA
|
|
Ramesh Pabbati
|
Yahoo!, Bellevue, WA, USA
|
|
Neeta Pandit
|
Yahoo!, Bellevue, WA, USA
|
|
Sreenivasulu Pokuri
|
Yahoo!, Sunnyvale, CA, USA
|
|
Krishna Uppala
|
Yahoo!, Sunnyvale, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 70, Downloads (12 Months): 315, Citation Count: 0
|
|
|
ABSTRACT
Insights based on detailed data on consumer behavior, product performance and marketplace behavior are driving innovation and competition in the internet space. We introduce Everest, a SQL-compliant data warehousing engine, based on a column architecture that we have built and deployed at Yahoo!. In contrast to commercially available engines, this massively parallel engine, based on commodity hardware, offers scale, flexibility, specialized analytic operations, and lower administrative & hardware costs. In this paper, we describe the business motivation and the software and deployment architecture of Everest. The engine is in production at Yahoo! since 2007 and currently manages over six petabytes of data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Prahalad, C.K., and Krishnan, M.S., The New Age of Innovation: Driving Cocreated Value Through Global Network. The McGraw Hill Companies, 2008
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
Yihong Zhao , Prasad M. Deshpande , Jeffrey F. Naughton , Amit Shukla, Simultaneous optimization and evaluation of multiple dimensional queries, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.271-282, June 01-04, 1998, Seattle, Washington, United States
|
 |
7
|
Kevin Lim , Parthasarathy Ranganathan , Jichuan Chang , Chandrakant Patel , Trevor Mudge , Steven Reinhardt, Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments, Proceedings of the 35th International Symposium on Computer Architecture, p.315-326, June 21-25, 2008
|
| |
8
|
Hong, W., Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays, PhD Dissertation, EECS Department, University of California, Berkeley, 1993
|
| |
9
|
|
| |
10
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Distributed systems
General Terms:
Algorithms,
Languages,
Performance,
Reliability,
Standardization
Keywords:
analytics,
business intelligence,
column database,
column storage,
data warehousing,
mpp database,
vector query processing
|