abstract from
Illinois-Intel Multithreading Library (IML): Multithreading Support for iA-based Multiprocessor Systems

Powerful desktop multiprocessor systems based on the Intel Architecture (iA) offer a formidable alternative to scientific/engineering and commercial application developers at an attractive cost-performance ratio. However, the lack of adequate compiler and runtime library support for multithreading and parallel processing on Windows NT makes it difficult or impossible to fully exploit the performance advantage of these multiprocessor systems. In this paper we describe the design, development and initial performance results of the Illinois-Intel Multithreading Library (IML) which aims at providing an efficient and powerful (in terms of types of parallelism it supports) API for multithreaded application developers. IML implements a parallel execution environment, which creates, enqueues/dequeues, binds and schedules user-level threads on Windows NT threads and fibers. One of the unique and novel features of IML is its support for both loop-level (data) parallelism and task-level (functional) parallelism, as well as nested parallel threads. Although loop-level parallelism is most useful in scientific and engineering applications, functional parallelism is often the norm in multimedia, internet and Java applications. IML upgrades the multithreading support available on the iA-based Windows NT platforms to levels that are comparable or superior to those found on high-end parallel and supercomputers. Multithreading a number of diverse benchmarks (ranging from POV-Ray to SPECfp95 and the BLAS routines) using IML resulted in significant speedups on a quad-Pentium Pro system.

Future releases of IML will support several loop scheduling schemes as well as controlled thread migration for the purpose of dynamic load balancing. The programmer or the compiler would thus be able to customize scheduling on a per loop basis taking into consideration performance sensitive characteristics such as branches inside loops and data locality. The Intel FORTRAN compiler and the Parafrase-2 experimental parallelizing compiler are being enhanced in order to automatically generate the IML API, thereby facilitating the development of multithreaded application codes that fully exploit the performance potential of iA-based multiprocessor servers and desktops.