ACM Home Page
Please provide us with feedback. Feedback
Scheduling to minimize staleness and stretch in real-time data warehouses
Full text PdfPdf (408 KB)
Source
ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures table of contents
Calgary, AB, Canada
SESSION: Multiprocessor scheduling table of contents
Pages 29-38  
Year of Publication: 2009
ISBN:978-1-60558-606-9
Authors
Mohammad Hossein Bateni  Princeton University, Princeton, NJ, USA
Lukasz Golab  AT&T Labs - Research, Florham Park, NJ, USA
Mohammad Taghi Hajiaghayi  AT&T Labs - Research, Florham Park, NJ, USA
Howard Karloff  AT&T Labs - Research, Florham Park, NJ, USA
Sponsors
SIGOPS: ACM Special Interest Group on Operating Systems
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 68,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1583991.1583998
What is a DOI?

ABSTRACT

We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. Data for each table are generated at a constant rate, different tables possibly at different rates. For each data feed, the arrival of new data triggers an update that seeks to append the new data to the corresponding table; if multiple updates are pending for the same table, they are batched together before being loaded. At time τ, if a table has been updated with information up to time r≤τ, its staleness is defined as τ--r.

Our first objective is to schedule the updates on one or more processors in a way that minimizes the total staleness. In order to ensure fairness, our second objective is to limit the maximum "stretch", which we define (roughly) as the ratio between the duration of time an update waits till it is finished being processed, and the length of the update.

In contrast to earlier work proving the nonexistence of constant-competitive algorithms for related scheduling problems, we prove that any online nonpreemptive algorithm, no processor of which is ever voluntarily idle, incurs a staleness at most a constant factor larger than an obvious lower bound on total staleness (provided that the processors are sufficiently fast). We give a constant-stretch algorithm, provided that the processors are sufficiently fast, for the quasiperiodic model, in which tables can be clustered into a few groups such that the update frequencies within each group vary by at most a constant factor. Finally, we show that our constant-stretch algorithm is also constant-competitive (subject to the same proviso on processor speed) in the quasiperiodic model with respect to total weighted staleness, where tables are assigned weights that reflect their priorities.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
6
7
 
8
 
9
 
10
11
 
12
 
13
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N.-E. Frantzell, Supporting Streaming Updates in an Active Data Warehouse, ICDE 2007, 476--485.
14

Collaborative Colleagues:
Mohammad Hossein Bateni: colleagues
Lukasz Golab: colleagues
Mohammad Taghi Hajiaghayi: colleagues
Howard Karloff: colleagues