Jobs

JuliaHub.jl can be used to both submit new jobs, and to inspect running or finished jobs.

Submitting batch jobs

A common use case for this package is to programmatically submit Julia scripts as batch jobs to JuliaHub, to start non-interactive workloads. In a nutshell, these are Julia scripts, together with an optional Julia environment, that get executed on the allocated hardware.

The easiest way to start a batch job is to submit a single Julia script, which can optionally also include a Julia environment with the job. However, for more complex jobs, with multiple inputs files etc., appbundles are likely more suitable.

Script jobs

The simplest job one can submit is a humble Julia script, together with an optional Julia environment (i.e. Project.toml, Manifest.toml, and/or Artifacts.toml). These jobs can be created with the JuliaHub.@script_str string macro, for inline instantiation:

JuliaHub.submit_job(
    JuliaHub.script"""
    @warn "Hello World!"
    """,
)
JuliaHub.Job: jr-xf4tslavut (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files: 
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
 outputs: "{}"

Alternatively, they can be created with the script function, which can load the Julia code from a script file:

JuliaHub.submit_job(
    JuliaHub.script("myscript.jl"),
)
JuliaHub.Job: jr-xf4tslavut (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files: 
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
 outputs: "{}"

The string macro also picks up the currently running environment (i.e. Project.toml, Manifest.toml, and Artifacts.toml files), which then gets instantiated on JuliaHub when the script is started. If necessary, this can be disabled by appending the noenv suffix to the string macro.

JuliaHub.script"""
@warn "Hello World!"
"""noenv
JuliaHub.BatchJob:
code = """
@warn "Hello World!"
"""

With the script function, you can also specify a path to directory containing the Julia package environment, if necessary.

If an environment is passed with the job, it gets instantiated on the JuliaHub node, and the script is run in that environment. As such, any packages that are not available in the package registries or added via public Git URLs will not work. If that is the case, appbundles can be used instead to submit jobs that include private or local dependencies.

Appbundles

A more advanced way of submitting a batch job is as an appbundle, which "bundles up" a whole directory and submits it together with the script. The Julia environment in the directory is also immediately added into the bundle.

An appbundle can be constructed with the appbundle function, which takes as arguments the path to the directory to be bundled up, and a script within that directory. This is meant to be used for project directories where you have your Julia environment in the top level of the directory or repository.

For example, suppose you have a script at the top level of your project directory, then you can submit a bundle as follows:

JuliaHub.submit_job(
    JuliaHub.appbundle(@__DIR__, "script.jl"),
    ncpu = 4, memory = 16,
)
JuliaHub.Job: jr-xf4tslavut (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files: 
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
 outputs: "{}"

The bundler looks for a Julia environment (i.e. Project.toml, Manifest.toml, and/or Artifacts.toml files) at the root of the directory. If the environment does not exist (i.e. the files are missing), one is created. When the job starts on JuliaHub, this environment is instantiated.

A key feature of the appbundle is that development dependencies of the environment (i.e. packages added with pkg> develop or Pkg.develop()) are also bundled up into the archive that gets submitted to JuliaHub (including any current, uncommitted changes). Registered packages are installed via the package manager via the standard environment instantiation, and their source code is not included in the bundle directly.

When the JuliaHub job starts, the working directory is set to the root of the unpacked appbundle directory. This should be kept in mind especially when launching a script that is not at the root itself, and trying to open other files from the appbundle in that script (e.g. with open). You can still use @__DIR__ to load files relative to the script, and includes also work as expected (i.e. relative to the script file).

Finally, a .juliabundleignore file can be used to exclude certain directories, by adding the relevant globs, similar to how .gitignore files work. In addition, .git directories are also automatically excluded from the bundle.

Examining job configuration

The dryrun option to submit_job can be used to inspect the full job workload configuration that would be submitted to JuliaHub.

JuliaHub.submit_job(
    JuliaHub.script"""
    println("hello world")
    """,
    ncpu = 4, memory = 8,
    env = Dict("ARG" => "value"),
    dryrun = true
)
JuliaHub.WorkloadConfig:
application:
  JuliaHub.BatchJob:
  code = """
  println("hello world")
  """
  sha256(project_toml) = 93a83d60d4a9c6a3d1438259fd506929eaad296b7e112e886b305781b85cb85b
  sha256(manifest_toml) = 9bcc174dddab3db131c98b296be804b77f2c4720ca649feb9eb25039822ec5d5
compute:
  JuliaHub.ComputeConfig
   Node: 3.5 GHz Intel Xeon Platinum 8375C
    - GPU: no
    - vCores: 4
    - Memory: 16 Gb
    - Price: 0.33 $/hr
   Process per node: true
   Number of nodes: 1
timelimit = 1 hour, 
env: 
  ARG: value

Query, extend, kill

The package has function that can be used to interact with running and past jobs. The jobs function can be used to list jobs, returning an array of Job objects.

julia> js = JuliaHub.jobs(limit=3)3-element Vector{JuliaHub.Job}:
 JuliaHub.job("jr-eezd3arpcj")
 JuliaHub.job("jr-novcmdtiz6")
 JuliaHub.job("jr-3eka6z321p")
julia> js[1]JuliaHub.Job: jr-eezd3arpcj (Completed) submitted: 2023-03-15T07:56:50.974+00:00 started: 2023-03-15T07:56:51.251+00:00 finished: 2023-03-15T07:56:59.000+00:00 files: - code.jl (input; 3 bytes) - code.jl (source; 3 bytes) - Project.toml (project; 244 bytes) - Manifest.toml (project; 9056 bytes) outputs: "{}"

If you know the name of the job, you can also query the job directly with job.

julia> job = JuliaHub.job("jr-eezd3arpcj")JuliaHub.Job: jr-eezd3arpcj (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files:
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
  - outdir.tar.gz (result; 632143 bytes)
 outputs: "{\"result_variable\": 1234, \"another_result\": \"value\"}\n"
julia> job.status"Completed"
julia> JuliaHub.isdone(job)true

Similarly, the kill_job function can be used to stop a running job, and the extend_job function can be used to extend the job's time limit.

Waiting on jobs

A common pattern in a script is to submit one or more jobs, and then wait until the jobs complete, to then process their outputs. isdone can be used to see if a job has completed.

julia> job = JuliaHub.job("jr-novcmdtiz6")JuliaHub.Job: jr-novcmdtiz6 (Running)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files:
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
 outputs: "{}"
julia> JuliaHub.isdone(job)false

The wait_job function also provides a convenient way for a script to wait for a job to finish.

julia> job = JuliaHub.wait_job("jr-novcmdtiz6")JuliaHub.Job: jr-novcmdtiz6 (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 files:
  - code.jl (input; 3 bytes)
  - code.jl (source; 3 bytes)
  - Project.toml (project; 244 bytes)
  - Manifest.toml (project; 9056 bytes)
 outputs: "{}"
julia> JuliaHub.isdone(job)true

Accessing job outputs

There are two ways a JuliaHub job can store outputs that are directly related to a specific job[1]:

  1. Small, simple outputs can be stored by setting the ENV["RESULTS"] environment variable. Conventionally, this is often set to a JSON object, and will act as a dictionary of key value pairs.
  2. Files or directories can be uploaded by setting the ENV["RESULTS_FILE"] to a local file path on the job. Note that directories are combined into a single tarball when uploaded.

The values set via the RESULTS environment variable can be accessed with the .results field of a Job object:

julia> job.results"{\"user_param\": 2, \"output_value\": 4}\n"

As the .results string is often a JSON object, you can use the the JSON.jl or JSON3.jl packages to easily parse it. For example

julia> import JSON
julia> JSON.parse(job.results)Dict{String, Any} with 2 entries: "user_param" => 2 "output_value" => 4

When it comes to job result files, they can all be accessed via the .files field.

julia> job.files4-element Vector{JuliaHub.JobFile}:
 JuliaHub.job_file(JuliaHub.job("jr-novcmdtiz6"), :input, "code.jl")
 JuliaHub.job_file(JuliaHub.job("jr-novcmdtiz6"), :source, "code.jl")
 JuliaHub.job_file(JuliaHub.job("jr-novcmdtiz6"), :project, "Project.toml")
 JuliaHub.job_file(JuliaHub.job("jr-novcmdtiz6"), :project, "Manifest.toml")

The job_files function can be used to filter down to specific file types.

julia> JuliaHub.job_files(job, :result)JuliaHub.JobFile[]

And if you know the name of the file, you can also use the job_files to get the specific JobFile object for a particular file directly.

julia> jobfile = JuliaHub.job_file(job, :result, "outdir.tar.gz")

To actually fetch the contents of a file, you can use the download_job_file function on the JobFile objects.

  • 1You can also e.g. upload datasets etc. But in that case the resulting data is not, strictly speaking, related to a specific job.