Overview

This guide lays out the steps needed to go from logging in to an OSG Connect login node to running a full scale high throughput computing (HTC) workload on OSG's Open Science Pool (OSPool). The steps listed here apply to any new workload submission, whether you are a long-time OSG user or just getting started with your first workload, with helpful links to our documentation pages.

This guide assumes that you have applied for an account on the OSG Connect service and have been approved after meeting with an OSG Research Computing Facilitator. If you don't yet have an account, you can apply for one at or contact us with any questions you have.

1. Introduction to the OSPool and OSG Connect

The OSG's Open Science Pool is best-suited for computing work that can be run as many, independent tasks, in an approach called "high throughput computing." For more information on what kind of work is a good fit for the OSG, see Is the Open Science Pool for You?.

Learn more about the services provided by the OSG that can support your HTC workload:

OSG Introduction

2. Get on OSG Connect

After your OSG account has been approved, go through the following guides to complete your access to the login node and to enable your account to submit jobs.

3. Learn to Submit HTCondor Jobs

Computational work is run on the OSPool by submitting it as “jobs” to the HTCondor scheduler. Jobs submitted to HTCondor are then scheduled and run on different resources that are part of the Open Science Pool. Before submitting your own computational work, it is important to understand how HTCondor job submission works. The following guides show how to submit basic HTCondor jobs. The second example allows you to see where in the OSPool your jobs run.

4. Test a First Job

After learning about the basics of HTCondor job submission, you will need to generate your own HTCondor job -- including the software needed by the job and the appropriate mechanism to handle the data. We recommend doing this using a single test job.

Prepare your software

Software is an integral part of your HTC workflow. Whether you’ve written it yourself, inherited it from your research group, or use common open-source packages, any required executables and libraries will need to be made available to your jobs if they are to run on the OSPool.

Read through this overview of Using Software in OSG Connect to help you determine the best way to provide your software. We also have the following guides/tutorials for each major software portability approach:

Finally, here are some additional guides specific to some of the most common scripting languages and software tools used on OSG**:

**This is not a complete list. Feel free to search for your software in our Knowledge base.

Manage your data

The data for your jobs will need to be transferred to each job that runs in the OSPool, and HTCondor has built-in features for getting data to jobs. Our Data Management Policies guide discussed the relevant approaches, when to use them, and where to stage data for each.

Assign the Appropriate Job Duration Category

Jobs running in the OSPool may be interrupted at any time, and will be re-run by HTCondor, unless a single execution of a job exceeds the allowed duration. Jobs expected to take longer than 10 hours will need to identify themselves as 'Long' according to our Job Duration policies. Remember that jobs expected to take longer than 20 hours are not a good fit for the OSPool (see Is the Open Science Pool for You?) without implementing self-checkpointing (further below).

5. Scale Up

After you have a sample job running successfully, you’ll want to scale up in one or two steps (first run several jobs, before running ALL of them). HTCondor has many useful features that make it easy to submit multiple jobs with the same submit file.

6. Special Use Cases

If you think any of the below applies to you, please get in touch and our facilitation team will be happy to discuss your individual case.

Getting Help

For assistance or questions, please email the OSG Facilitation team at support@opensciencegrid.org or visit the help desk and community forums.

 

This page was updated on Nov 30, 2021 at 19:04 from start/roadmap.md.