ACM Home Page
Please provide us with feedback. Feedback
Monoids for rapid data flow analysis
Full text PdfPdf (1.27 MB)
Source Annual Symposium on Principles of Programming Languages archive
Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages table of contents
Tucson, Arizona
Pages: 47 - 59  
Year of Publication: 1978
Author
Barry K. Rosen  IBM Thomas J. Watson Research Center, Yorktown Heights, New York
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 34,   Citation Count: 5
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/512760.512767
What is a DOI?

ABSTRACT

The earliest data flow analysis research dealt with concrete problems (such as detection of available expressions) and with low level representations of control flow (with one large graph, each of whose nodes represents a basic block). Several recent papers have introduced an abstract approach, dealing with any problem expressible in terms of a semilattice L and a monoid M of isotone maps from L to L, under various algebraic constraints. Examples include [CC77; GW76; KU76; Ki73; Ta75; Ta76; We75]. Several other recent papers have introduced a high level representation with many small graphs, each of which represents a small portion of the control flow information in a program. The hierarchy of small graphs is explicit in [Ro77a; Ro77b] and implicit in papers that deal with syntax directed analysis of programs written within the confines of classical structured programming [DDH72, Sec. 1.7]. Examples include [TK76; ZB74]. The abstract papers have retained the low level representations while the high level papers have retained the concrete problems of the earliest work. This paper studies abstract conditions on L and M that lead to rapid data flow analysis, with emphasis on high level representations. Unlike some analysis methods oriented toward structured programming [TK76; Wu75; ZB74], our method retains the ability to cope with arbitrary escape and jump statements while it exploits the control flow information implicit in the parse tree.

The general algebraic framework for data flow analysis with semilattices is presented in Section 2, along with some preliminary lemmas. Our "rapid" monoids properly include the "fast" monoids of [GW76]. Section 3 relates data flow problems to the hierarchies of small graphs introduced in [Ro77a; Ro77b]. High level analysis begins with local information expressed by mapping the arcs of a large graph into the monoid M, much as in low level analysis. But each arc in our small graphs represents a set (often an infinite set) of paths in the underlying large graph. Appropriate members of M are associated with these arcs. This "globalized" local information is used to solve global flow problems in Section 4. The fundamental theorem of Section 4 is applied to programs with the control structures of classical structured programming in Section 5. For a given rapid monoid M, the time required to solve any global data flow problem is linear in the number of statements in the program. (For varying M, the time is linear in the product of this number by t@, where t@ is a parameter of M introduced in the definition of rapidity.) For reasons sketched at the beginning of Section 6, we feel obliged to cope with source level escape and jump statements as well as with classical structured programming. Section 6 shows how to apply the fundamental theorem of Section 4 to programs with arbitrary escapes and jumps. The explicit time bound is only derived for programs without jumps. A comparison between the results obtained by our method and those obtained by [GW76] is in Section 7, which also contains examples of rapid monoids in the full paper. Finally, Section 8 lists conclusions and open problems. Proofs of lemmas are omitted to save space. The full paper will resubmitted to a journal.

We proceed from the general to the particular, except in some places where bending the rule a little makes a significant improvement in the expository flow. Common mathematical notation is used. To avoid excessive parentheses, the value of a function f at an argument x is fx rather than f(x). If fx is itself a function then (fx)y is the result of applying fx to y. The usual ¡Ü and ¡Ý symbols are used for arbitrary partial orders as well as for the usual order among integers. A function from a partially ordered set (poset) to a poset is isotone iff x ¡Ü y implies fx ¡Ü fy. (Isotone maps are sometimes called "monotonic" in the literature.) A meet semilattice is a poset with a binary operation ¡Ä such that x ¡Ä y is the greatest lower bound of the set {x, y}. A meet semilattice wherein every subset has a greatest lower bound is complete. In particular, the empty subset has a greatest lower bound T, so a complete meet semilattice has a maximum element. A monoid is a set together with an associative binary operation ∘ that has a unit element 1 : 1 ∘ m = m ∘ 1 = m for all m. In all our examples the monoid M will be a monoid of functions: every member of M is a function (from a set into itself), the operation ∘ is the usual composition (f ∘ g)x = f(gx), and the unit 1 is the identity function with 1X = x for all x. Two considerations governed the notational choices. First, we speak in ways that are common in mathematics and are convenient here. Second, we try to facilitate comparisons with [GW76; KU76; Ro77b], to the extent that the disparities among these works permit. One disparity is between the meet semilattices of [GW76; KU76; Ki73] and the join semilattices of [Ro77b; Ta75; We75], where least upper bounds are considered instead of greatest lower bounds. To speak of meets is more natural in applications that are intuitively stated in terms of "what must happen on all paths" in some class of paths in a program, while to speak of joins is more natural in applications that are intuitively stated in terms of "what can happen on some paths." By checking whether there are any paths in the relevant class and by using the rule that 3 is equivalent to ⌍V⌍, join oriented applications can be reduced to meet oriented ones (and vice versa). A general theory should speak in one way or the other, and we have chosen meets. For us, strong assertions about a program's data flow are high in the semilattice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
6
 
7
Ha77b. Harrison, W. H. Compiler analysis of the value ranges for variables. IEEE Trans. on Software Engineering3 ( 1977), 243-250.
 
8
KU77. Kam, J. B., and Ullman J. D. Monotone data flow analysis framework. Acta Informatica7 (1977), 305-317.
9
10
11
 
12
La77. Lamport, L. Proving the correctness of multiprocess programs. IEEE Trans. on Software Engineering3 (1977), 125-143.
13
14
 
15
Ro76. Rosen, B. K. Correctness of parallel programs: the Church-Rosser approach. Theoretical Computer Science2 (1976), 183-207.
16
17
 
18
TK76. Taniguchi, K., and Kasami, T. An O(n) algorithm for computing the set of available expressions of D-charts. Acta Informatica6 (1976), 361-364.
 
19
 
20
 
21
Ul73. Ullman, J. D. Fast algorithms for the elimination of common subexpressions. Acta Informatica2 (1973), 191-213.
 
22
We75. Wegbreit, B. Property extraction in well founded property sets. IEEE Trans. on Software Engineering1 (1975), 270-285.
 
23
 
24
 
25
ZB74. Zelkowitz, M. V., and Bail, W. G. Optimization of structured programs. Software Practice and Experience4 (1974), 51-57.