Overview

If some of the data or software your jobs depend on is available via the web, you can have such files transferred by HTCondor using the appropriate HTTP address!

Important Considerations

While our Overview of Data Mangement on OSG Connect describes how you can stage data, files, or even software in OSG Connect data locations, any web-accessible file in a non-OSG Connect location can be transferred directly to your jobs IF:

  • the file is accessible via an HTTP/HTTPS address
  • the file is less than 1GB in size (if larger, you'll need to pre-stage them for stash-based transfer
  • the server or website they're on can handle large numbers of your jobs accessing them simultaneously

Importantly, you'll also want to make sure your job executable knows how to handle the file (un-tar, etc.) from within the working directory of the job, just like it would for any other input file.

Transfer Files via HTTP

Use an HTTP URL in combination with the transfer_input_files statement in your HTCondor submit file.

For example:

# submit file example

log = my_job.$(Cluster).$(Process).log
error = my_job.$(Cluster).$(Process).err
output = my_job.$(Cluster).$(Process).out

# transfer software tarball from public via http
transfer_input_files = http://www.website.com/path/file.tar.gz

...other submit file details...

Multiple URLs may be specified in any order, using a comma-separated list, and a combination of URLs and files from other locations (e.g. within /home) can be provided in the list. For example,

# transfer software tarball from public via http
# transfer input data from home via htcondor file transfer
transfer_input_files = http://www.website.com/path/file1.tar.gz, http://www.website.com/path/file2.tar.gz, my_data.csv

Get Help

For assistance or questions, please email the OSG Research Facilitation team at support@opensciencegrid.org or visit the help desk and community forums.

 

This page was updated on Jul 20, 2022 at 21:24 from start/data/file-transfer-via-http.md.