This guide lays out the steps needed to go from logging in to the OSG Connect login node to running a full scale high throughput computing (HTC) workload on the OSG. The steps listed here apply to any new workload submission, whether you are a long-time OSG user or just getting started - this guide can be used as an "overview" of the steps needed to get up and running.
For new users, this guide also includes links to our documentation pages, providing information and instruction about how to perform each step of developing a new workload submission.
This guide assumes that you have applied for an OSG Connect account and been approved. If you don't yet have an account, you can apply for one at or contact us with any questions you have.
1. Introduction to the OSG
The OSG is a nationally-funded consortium of computing resources at more than one hundred institutional partners that, together, offer a strategic advantage for computing work that can be run as numerous short tasks.
The OSG is best-suited for computing work that can be run as many, independent tasks, in an approach called "high throughput computing." For more information on what kind of work is a good fit for the OSG, see Is the OSG for You?.
Learn more about the services provided by the OSG that can support your HTC workload:
2. Get On OSG Connect
After your OSG account has been approved, go through the following guides to complete your access to the login node and to enable your account to submit jobs.
3. Explore HTCondor
Computational work is run on the OSG by submitting it as “jobs” to the HTCondor scheduler. Jobs submitted to HTCondor are then scheduled and run on different resources that are part of the OSG’s Open Science Pool. Before submitting your own computational work, it is important to understand how HTCondor job submission works. The following guides show how to submit basic HTCondor jobs. The second example allows you to see where in the OSG your jobs ran.
4. Test a Job
After learning about the basics of HTCondor job submission, you will need to generate your own HTCondor job -- including the software needed by the job and the appropriate mechanism to handle the data. We recommend doing this using a single test job.
Prepare your software
Software is an integral part of your HTC workflow. Whether you’ve written it yourself, inherited it from your research group, or use common open-source packages, any required executables and libraries will need to be made available to your jobs if they are to run on OSG.
Read through this overview of Using Software in OSG Connect to help you determine the best way to provide your software. Once you know which method you would like to try, select and complete one of the following guides/tutorials:
- To install your own software, begin with the guide on Compiling Software for OSG Connect and then complete the Example Software Compilation tutorial.
- To use precompiled binaries, try the example presented in the AutoDock Vina tutorial and/or the Julia tutorial.
- To use Docker containers for your jobs, start with the Docker and Singularity Containers guide, and (optionally) work through the Tensorflow tutorial (which uses Docker/Singularity)
- To use Distributed Environment Modules for your jobs, start with this Modules guide and then complete the Module example in this R tutorial
**This is not a complete list. Feel free to search for your software in our Knowledge base.
Move your data
The data for your jobs originates on the OSG Connect login node, but needs to be copied to other nodes in the OSG in order to be used by jobs. There are different ways for your jobs to move their input and output data based on their size. The following guides summarize which options exist and when to use them.
- Read about Data: Data Management Policies
- Pick a tutorial?
5. Scale Up
After you have a sample job running successfully, you’ll want to scale up. HTCondor has many useful features that make it easy to submit multiple jobs. First, look at the guide that describes a testing process, and follow it, making sure to slowly scale up. To see how to submit multiple jobs, see the second guide.
- Things to think about: Scaling up after success with test jobs
- Scale up to multiple jobs: Easily submit multiple jobs
6. Special Use Cases
If you think any of the below applies to you, please get in touch and our facilitation team will be happy to discuss your individual case.
- How to run sequential workflows of jobs: Workflows with HTCondor's DAGMan
- How to handle jobs that are longer than our recommended 12 hours: [Checkpointing Guide (forthcoming)]
- How to build your own Docker container: Creating a Docker Container Image
- How to safely submit more than 10,000 jobs: FAQ, search for max_idle
- Larger or speciality resource requests:
This page was updated on Oct 15, 2021 at 21:43 from start/roadmap.md.