|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
ABSTRACT
The traditional permutation multiplication algorithm is now limited by memory latency and not by CPU speed. A new cache-aware permutation algorithm speeds up permutation multiplication by a factor of 3.4 on current CPUs. The new algorithm is limited by memory bandwidth, but not by memory latency. Current trends indicate improving memory bandwidth and stagnant memory latency. This makes the new algorithm especially important for future computer architectures. In addition, we believe this "memory wall" will soon force a redesign of other common algorithms of symbolic algebra. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
|
||||||||||||||||||||||||||||