This tutorial describes how to run R scripts on the OSG. We'll first run the program locally as a test. After that we'll create a submit file, submit it to OSG using OSG Connect, and look at the results when the jobs finish.
Run R scripts on OSG
Access R on the submit host
First we'll need to create a working directory, you can either run
$ tutorial R or type the following:
$ mkdir tutorial-R; cd tutorial-R
R is installed using modules on OSG. To load this modules and access R, enter:
$ module load r
Now, we can try to run R:
$ R R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
Great! R works. You can quit out with
> q() Save workspace image? [y/n/c]: n $
Run R code
Now that we can run R, let's create a small script. Create the file
hello_world.R that contains the following:
R normally runs as an interactive shell, but it is easy to run in batch mode too.
$ Rscript --no-save hello_world.R  "Hello World!"
Notice here that we're using Rscript (equivalent to
R CMD BATCH) which accepts the script as command line argument. This approach makes
R much less verbose, and it's easier to parse the output later. If you run it at the command line, you should get similar output as above.
Build the HTCondor job
To prepare our R job to run on OSG, we need to create a wrapper for our R environment, based on the setup we did in previous sections. Create the file
#!/bin/bash module load r Rscript --no-save hello_world.R
Change the permissions on the wrapper script so it is executable and then test it for correct output:
$ chmod +x R-wrapper.sh $ ./R-wrapper.sh  "Hello World!"
Now that we've created a wrapper, let's build a HTCondor submit file around it. We'll call this one
universe = vanilla log = R.log.$(Cluster).$(Process) error = R.err.$(Cluster).$(Process) output = R.out.$(Cluster).$(Process) executable = R-wrapper.sh transfer_input_files = hello_world.R request_cpus = 1 request_memory = 1GB request_disk = 1GB requirements = OSGVO_OS_STRING == "RHEL 7" && Arch == "X86_64" && HAS_MODULES == True queue 1
R.submit file may have included a few lines that you are unfamiliar with. For example,
$(Process) are variables that will be replaced with the job's cluster and process id. This is useful when you have many jobs submitted in the same file. Any output and errors will be placed in a separate file for each job.
Notice the requirements line? You'll need to put
HAS_MODULES == True any time you need software that is loaded via modules.
Also, did you see the transfer_input_files line? This tells HTCondor what files to transfer with the job to the worker node. You don't have to tell it to transfer the executable, HTCondor is smart enough to know that the job will need that. But any extra files, such as our R script file, will need to be explicitly listed to be transferred with the job. You can use transfer_input_files for input data to the job, as shown in Transferring data with HTCondor.
Submit and analyze the output
Finally, submit the job to OSG Connect!
$ condor_submit R.submit Submitting job(s).......... 1 job(s) submitted to cluster 3796250. $ condor_q user $ condor_q -- Schedd: login03.osgconnect.net : <126.96.36.199:9618?... @ 05/13/19 09:51:04 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS user ID: 3796250 5/13 09:50 _ _ 1 1 3796250.0 ...
You can follow the status of your job cluster with the
connect watch command, which shows
condor_q output that refreshes each 5 seconds. Press
control-C to stop watching.
Since our jobs prints to standard out, we can check the output files. Let's see what one looks like:
$ cat R.out.3796250.0  "Hello World!"
We recommend you read about how to steer your jobs with HTCondor job requirements - this will allow you to select good resources for your workload. Please see this page
This page was updated on Dec 22, 2020 at 22:54 from tutorials/tutorial-R/README.md.