Difference between revisions of "SPRINT"
(SPRINT A parallel framework for the statistical language R. The framework includes several functions that can be ran in parallel to reduce computation time by using multiple cores.) |
m |
||
Line 1: | Line 1: | ||
− | <strong>SPRINT</strong> is a parallel framework for R, it allows certain functions to be run across several cores at the same time. | + | <strong>SPRINT</strong>[http://www.r-sprint.org/] is a parallel framework for R, it allows certain functions to be run across several cores at the same time. |
The full manual for SPRINT can be found here [http://www.ed.ac.uk/polopoly_fs/1.113414!/fileManager/External_User_Guide_1.0.4.1.pdf] | The full manual for SPRINT can be found here [http://www.ed.ac.uk/polopoly_fs/1.113414!/fileManager/External_User_Guide_1.0.4.1.pdf] |
Latest revision as of 11:43, 4 November 2013
SPRINT[1] is a parallel framework for R, it allows certain functions to be run across several cores at the same time.
The full manual for SPRINT can be found here [2]
Below is some instructions on using SPRINT with the IBERS HPC.
Available Parallel Functions
The following is a list of functions that are can be ran in parallel using SPRINT, more information on each can be seen in the manual linked above.
- papply() - parallel version of apply() or lapply()
- pboot() - parallel bootstrapping
- pcor() - parallel Pearson's Correlation
- pmaxT() - parallel version of mt.maxT (from multtest package)
- ppam() - parallel version of pam() (from cluster package)
- prandomForest() - parallel implementation of randomForest (from randomForest package)
- pRP() - parallel rank product analysis algorithm (comparable to RP() from RankProd package)
Using Sprint
Sprint requires two HPC modules to be loaded
module load R/R-3.0.2 module load openmpi/gcc
To make use of sprint you need to import the sprint library into you R script, you do that using the command
library(sprint)
You can then load data and carry on your normal R processing as normal. To use a parallel function you simply call one of the above functions as normal, once the R interpreter requires this functions it will automatically handle all the parallel processing. In example.
papply(my_data, some_function)
When you wish to stop using the parallel interface you need to call a function to tell R you have finished working in parallel, the following code will do that.
pterminate()
Example Sprint Script
To make use of the parallel implementations of these functions supplied by the SPRINT framework, we also need to use the MPI framework to allow for SPRINT to execute R code across multiple cores at the same time, the following is a test script for SPRINT.
First the R Script, it was given the filename sprint_test.R
#Load the sprint library library(sprint) #Function provided by SPRINT to test functionality ptest() #Needed to end MPI calls and return to serial processing pterminate() #End R Script quit()
Now the sun grid engine script, given filename sprint_test_run.sge
#$ -S /bin/sh #$ -cwd #$ -q amd.q #Remember to add hard/soft limits for memory if required. #$ -l h_vmem=6G #Specify number of cores required (in this case five) #$ -pe mpich 5 #Load our required modules module load R/R-3.0.2 module load openmpi/gcc # The following is the command that tells the HPC that the script needs to be run in parallel # using the MPI framework, you also here need to tell MPI how many cores it can use, this number # needs to match the number given to the Grid engine above. MPI is started using the following command # mpiexec -n num_of_cores command_to_run # R -f filename is just the command to tell R to run the commands within the given file. mpiexec -n 5 R -f sprint_test.R
Then from the command line we execute the sprint_test_run.sge
qsub -N sprint_test run_sprint_test.sge
And then in the output file we should see
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) .... Omitted Output To Reduce Wiki Page Size .... > library(sprint) > library(sprint) > library(sprint) > library(sprint) > library(sprint) > > ptest() [1] "HELLO, FROM PROCESSOR: 0" "HELLO, FROM PROCESSOR: 4" [3] "HELLO, FROM PROCESSOR: 2" "HELLO, FROM PROCESSOR: 3" [5] "HELLO, FROM PROCESSOR: 1" > pterminate() > quit()