M

Max Grossman

Georgia Institute of Technology

Publishes on Parallel Computing and Optimization Techniques, Distributed and Parallel Computing Systems, Cloud Computing and Resource Management. 58 papers and 1k citations.

58Publications
1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Integrating Asynchronous Task Parallelism with MPI
Cited by 83

Effective combination of inter-node and intra-node parallelism is recognized to be a major challenge for future extreme-scale systems. Many researchers have demonstrated the potential benefits of combining both levels of parallelism, including increased communication-computation overlap, improved memory utilization, and effective use of accelerators. However, current “hybrid programming” approaches often require significant rewrites of application code and assume a high level of programmer expertise. Dynamic task parallelism has been widely regarded as a programming model that combines the best of performance and programmability for shared-memory programs. For distributed-memory programs, most users rely on efficient implementations of MPI. In this paper, we propose HCMPI (Habanero-C MPI), an integration of the Habanero-C dynamic task-parallel programming model with the widely used MPI message-passing interface. All MPI calls are treated as asynchronous tasks in this model, thereby enabling unified handling of messages and tasking constructs. For programmers unfamiliar with MPI, we introduce distributed data-driven futures (DDDFs), a new data-flow programming model that seamlessly integrates intra-node and inter-node data-flow parallelism without requiring any knowledge of MPI. Our novel runtime design for HCMPI and DDDFs uses a combination of dedicated communication and computation specific worker threads. We evaluate our approach on a set of micro-benchmarks as well as larger applications and demonstrate better scalability compared to the most efficient MPI implementations, while offering a unified programming model to integrate asynchronous task parallelism with distributed-memory parallelism.

HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL
Cited by 50

As the scale of high performance computing systems grows, three main challenges arise: the programmability, reliability, and energy efficiency of those systems. Accomplishing all three without sacrificing performance requires a rethinking of legacy distributed programming models and homogeneous clusters. In this work, we integrate Hadoop MapReduce with OpenCL to enable the use of heterogeneous processors in a distributed system. We do this by exploiting the implicit data parallelism of mappers and reducers in a MapReduce system. Combining Hadoop and OpenCL provides 1) an easy-to-learn and flexible application programming interface in a high level and popular programming language, 2) the reliability guarantees and distributed file system of Hadoop, and 3) the low power consumption and performance acceleration of heterogeneous processors. This paper presents HadoopCL: an extension to Hadoop which supports execution of user-written Java kernels on heterogeneous devices, optimizes communication through asynchronous transfers and dedicated I/O threads, automatically generates OpenCL kernels from Java byte code using the open source tool APARAPI, and achieves nearly 3x overall speedup and better than 55x speedup of the computational sections for example MapReduce applications, relative to Hadoop.