However, the wide variety of parallel programming paradigms (e.g., OpenMP, MPI, UPC) and parallel architectures (e.g., SMPs, PC clusters, IBM BlueGene) make choosing an appropriate parallelization approach diffi- cult. In turn, exploiting the power of parallel processing is becoming essential to solving many computationally chal- lenging problems. 1 Introduction From multicore microprocessors to large clusters of powerful yet inexpensive PCs, parallelism is becoming increasingly available. By varying key work steal- ing parameters, we expose important tradeoffs between the granularity of load balance, the degree of parallelism, and communication costs. Since dynamic load balancing requires intensive communication, performance portability remains difficult for applications such as UTS and performance degrades on PC clusters. However, UPC cannot alleviate the underlying communication costs of distributed-memory sys- tems. Results show that both UPC and OpenMP can support efficient dynamic load balancing on shared-memory architectures. We found it simple to implement UTS in both UPC and OpenMP, due to UPC’s shared-memory abstractions. We benchmarked the performance of UTS on various parallel architectures, including shared- memory systems and PC clusters. We created versions of UTS in two par- allel languages, OpenMP and Unified Parallel C (UPC), using work steal- ing as the mechanism for reducing load imbalance. We describe al- gorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. This paper presents an unbalanced tree search (UTS) bench- mark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. of North Carolina at Chapel Hill, 3 Dept. UTS: An Unbalanced Tree Search Benchmark Stephen Olivier 1, Jun Huan 1, Jinze Liu 1, Jan Prins 1, James Dinan 2, P.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |