IML provides a set of easy-to-use library functions, calls to which are to be inserted by hand by application programmers or high-performance library developers, or automatically by compilers. CSRD and MRL have been using IML for in-house application development. The Parafrase-2 automatic parallelizing compiler, developed at CSRD, now has a source code generator which interfaces with IML and thus automates the whole parallelization process. A prototype automatic parallelizing compiler based on Intel C/C++ and FORTRAN Compilers is developed at MRL.
IML exploits general kinds of parallelism, from loop (or DOALL/DOACROSS) and COBEGIN parallelism to DAG-functional parallelism, and from single-level parallelism to arbitrary nesting of parallelism (such as DOALL inside of COBEGIN).
IML provides the following API functions (among others):
List of all API functions is available here.Since IML API does not make use of any special functionality of certain platforms other than shared address space between threads, it should be straight forward to port IML to other operating systems and system architectures.
Support for the OpenMP standard
is under development.
Compatible Compilers and Systems
IML is compatible to the following systems and compilers:
cl -MT foo.c bar.c libiml.lib
fl32 -MT -4Ya foo.f bar.f libiml.lib
You need to link to multithreaded library such as libcmt.lib as opposed to non-multithreaded libc.lib (via -MT option to the compiler or equivalent method). Otherwise, the correctness of the program may get compromised. In FORTRAN, at least some functions/subroutines need to have local variables on the stack rather than some statically allocated location (via -4Ya option or equivalent). This is because multiple instances of the subroutine passed to DOALL (as well as any functions/subroutines called from there) can become active at the same time. With static local variables used in FORTRAN77 standard, such multiple instances of the subroutine share the same local variable storage, leading to incorrect results.
With some applications (such as tomcatv.f), you may also have to increase the stack size by using the linker option /STACK:size1,size2, where size1 and size2 are the stack sizes to be allocated and committed, respectively. We used to use the following command line to compile tomcatv.f:
fl32 -MT -4Ya tomcatv.f libiml.lib -link /STACK:20000000,20000000
Alternatively, you can compile the main routine of tomcatv using
static local variables and all other routines (i.e., loop body
functions) using automatic local variables. This is probably
much better than increasing the stack size.
Original Code:
#define N 100
int main(){
int i, A[N], B[N];
for(i = 0; i < N; i++){ // This loop is a DOALL loop.
A[i] = B[i];
}
}
DOALL Version:
#define N 100
void body1(int *start, int *iters, int A[], int B[]);
void body2(int *start, int A[], int B[]);
int main(){
int i, A[N], B[N];
int iters, sched, chunk, params;
iters = N;
sched = iANY_SCHEDULING;
chunk = 8;
params = 2;
iml_DOALL((void (*)(void))body1, (void (*)(void))body2, &iters, &sched, &chunk, ¶ms, A, B);
}
void body1(int *start, int *iters, int A[], int B[]){
int i;
for(i = *start; i < *start + *iters; i++){
A[i] = B[i];
}
}
void body2(int *start, int A[], int B[]){
int i;
for(i = *start; i < *start + 8; i++){
A[i] = B[i];
}
}
Original Code:
#define N 100
int main(){
int i, A[N], B[N], a, b;
a = 0;
b = 0;
// Two loops can run in parallel.
for(i = 0; i < N; i++){
a += A[i];
}
for(i = 0; i < N; i++){
b += B[i];
}
}
COBEGIN Version:
#define N 100
void body1(int *ap, int *bp, int A[], int B[]);
void body2(int *ap, int *bp, int A[], int B[]);
int main(){
int i, A[N], B[N], a, b;
int tasks, params
tasks = 2;
params = 4;
iml_COBEGIN((&tasks, ¶ms, void (*)(void))body1, (void (*)(void))body2, &a, &b, A, B)
}
void body1(int *ap, int *bp, int A[], int B[]){
int i;
for(i = 0; i < N; i++){
*ap += A[i];
}
}
void body2(int *ap, int *bp, int A[], int B[]){
int i;
for(i = 0; i < N; i++){
*bp += B[i];
}
}
Original Code:
PROGRAM test INTEGER N, I PARAMETER (N=100) INTEGER A(N), B(N) DO I=1, N A(I) = B(I) ENDDO ENDDOALL Version
include 'rts.fi' PROGRAM test INTEGER N PARAMETER (N=100) INTEGER A(N), B(N) INTEGER iters, sched, chunk, params EXTERNAL body1, body2 iters = N sched = 1 chunk = 8 params = 2 call iml_DOALL(body1, body2, iters, sched, chunk, params, A, B) END SUBROUTINE body1(start, iters, A, B) INTEGER N, I PARAMETER (N=100) INTEGER start, iters INTEGER A(N), B(N) DO I = start + 1, start + iters + 1 A(I) = B(I) ENDDO END SUBROUTINE body2(start, A, B) INTEGER N, I PARAMETER (N=100) INTEGER start, iters INTEGER A(N), B(N) DO I = start + 1, start + 8 + 1 A(I) = B(I) ENDDO END