ROSE Compiler Framework/LoopProcessor
Appearance
Where is the tool
[edit | edit source]Source file
Binary, not built or installed by default . You have to build it:
- cd rose_buildtree/tutorial
- make loopProcessor
Documentation
[edit | edit source]See more at
- Chapter 38 of http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf
Command line options
[edit | edit source]..[buildtree/tutorial]./loopProcessor --help loopProcessor <options> <program name> -gobj: generate object file -orig: copy non-modified statements from original file # split loop #---------------------------------- -splitloop: applying loop splitting to remove conditionals inside loops -annot <filename> -pre: apply partial redundancy elimination -fd: apply finite differencing to array index expressions # Debugging options #---------------------------------- -debugloop: print debugging information for loop transformations; -debugdep: print debugging information for dependence analysis; -tmloop: print timing information for loop transformations; # Use special function to denote array access (the special function can be replaced # with macros after transformation). This option is for circumventing complex # subscript expressions for linearized multi-dimensional arrays. -arracc <funcname>: use function <funcname> to denote multi-dimensional array access; opt <level=0>: the level of loop optimizations to apply; by default, only the outermost level is optimized; # unroll loop: #---------------------------------- -unroll [-locond] [-nvar] [poet] <-unrollsize> : unrolling innermost loops at <unrollsize> # break up statements in loops #---------------------------------- -bs <stmtsize> : break up statements in loops at <stmtsize> -bk_poet <blocksize> : parameterize the blocking transformation -par_poet <blocksize> : paralleization transformation using POET # loop blocking #---------------------------------- -bk1 <blocksize> :block outer loops -bk2 <blocksize> :block inner loops -bk3 <blocksize> :block all loops # copy array #---------------------------------- -cp <copydim> :copy array regions with dimensions <= <copydim> -cp_poet<copydim> :parameterize array copy array regions; to be applied together with blocking. # loop interchange #---------------------------------- -ic1 :loop interchange for more reuses // *** # loop fission #---------------------------------- -fs0 : maximum distribution at all loops -fs01 : maximum distribution at inner-most loops # loop fusing #---------------------------------- -fs1 :single-level loop fusion for more reuses -fs2 :multi-level loop fusion for more reuses # Max number of nodes to split for transitive dependence analysis (to limit the overhead of transitive dep. analysis) -ta <int> :split limit for transitive dep. analysis # set cache line size in evaluating spatial locality (affect decisions in applying loop optimizations) -clsize <int> :set cache line size # set maximum distance of reuse that can exploit cache (used to evaluate temporal locality of loops) -reuse_dist <int> :set reuse distance -dt :perform dynamic tuning
Example use
[edit | edit source]Loop fusion
// -----------test loop fusion input.c --------------- #define N 1024 void foo(double a[N], double b[N], double c[N]) { int i,j; for (i = 0; i < N; i++) a[i - 1] = b[i]; for (j = 0; j < N; j++) c[j] = a[j]; } // command line [..buildtree/tutorial]./loopProcessor -fs2 input.c //------------------------ output--------------- // test loop fusion #define N 1024 void foo(double a[1024],double b[1024],double c[1024]) { int i; int j; for (i = 0; i <= 1024; i += 1) { if (i <= 1023) { a[i - 1] = b[i]; } else { } if (i >= 1) { c[-1 + i] = a[-1 + i]; } else { } } }