Jobs
JuliaHub.jl can be used to both submit new jobs, and to inspect running or finished jobs.
Submitting batch jobs
A common use case for this package is to programmatically submit Julia scripts as batch jobs to JuliaHub, to start non-interactive workloads. In a nutshell, these are Julia scripts, together with an optional Julia environment, that get executed on the allocated hardware.
The easiest way to start a batch job is to submit a single Julia script, which can optionally also include a Julia environment with the job. However, for more complex jobs, with multiple inputs files etc., appbundles are likely more suitable.
Script jobs
The simplest job one can submit is a humble Julia script, together with an optional Julia environment (i.e. Project.toml
, Manifest.toml
, and/or Artifacts.toml
). These jobs can be created with the JuliaHub.@script_str
string macro, for inline instantiation:
JuliaHub.submit_job(
JuliaHub.script"""
@warn "Hello World!"
""",
)
JuliaHub.Job: jr-xf4tslavut (Completed)
submitted: 2023-03-15T07:56:50.974+00:00
started: 2023-03-15T07:56:51.251+00:00
finished: 2023-03-15T07:56:59.000+00:00
files:
- code.jl (input; 3 bytes)
- code.jl (source; 3 bytes)
- Project.toml (project; 244 bytes)
- Manifest.toml (project; 9056 bytes)
outputs: "{}"
Alternatively, they can be created with the script
function, which can load the Julia code from a script file:
JuliaHub.submit_job(
JuliaHub.script("myscript.jl"),
)
JuliaHub.Job: jr-xf4tslavut (Completed)
submitted: 2023-03-15T07:56:50.974+00:00
started: 2023-03-15T07:56:51.251+00:00
finished: 2023-03-15T07:56:59.000+00:00
files:
- code.jl (input; 3 bytes)
- code.jl (source; 3 bytes)
- Project.toml (project; 244 bytes)
- Manifest.toml (project; 9056 bytes)
outputs: "{}"
The string macro also picks up the currently running environment (i.e. Project.toml
, Manifest.toml
, and Artifacts.toml
files), which then gets instantiated on JuliaHub when the script is started. If necessary, this can be disabled by appending the noenv
suffix to the string macro.
JuliaHub.script"""
@warn "Hello World!"
"""noenv
JuliaHub.BatchJob:
code = """
@warn "Hello World!"
"""
With the script
function, you can also specify a path to directory containing the Julia package environment, if necessary.
If an environment is passed with the job, it gets instantiated on the JuliaHub node, and the script is run in that environment. As such, any packages that are not available in the package registries or added via public Git URLs will not work. If that is the case, appbundles can be used instead to submit jobs that include private or local dependencies.
Appbundles
A more advanced way of submitting a batch job is as an appbundle, which "bundles up" a whole directory and submits it together with the script. The Julia environment in the directory is also immediately added into the bundle.
An appbundle can be constructed with the appbundle
function, which takes as arguments the path to the directory to be bundled up, and a script within that directory. This is meant to be used for project directories where you have your Julia environment in the top level of the directory or repository. For example, you can submit an bundle from a submit script on the top level of your project directory as follows:
JuliaHub.submit_job(
JuliaHub.appbundle(@__DIR__, "script.jl"),
ncpu = 4, memory = 16,
)
JuliaHub.Job: jr-xf4tslavut (Completed)
submitted: 2023-03-15T07:56:50.974+00:00
started: 2023-03-15T07:56:51.251+00:00
finished: 2023-03-15T07:56:59.000+00:00
files:
- code.jl (input; 3 bytes)
- code.jl (source; 3 bytes)
- Project.toml (project; 244 bytes)
- Manifest.toml (project; 9056 bytes)
outputs: "{}"
The bundler looks for a Julia environment (i.e. Project.toml
, Manifest.toml
, and/or Artifacts.toml
files) at the root of the directory. If the environment does not exist (i.e. the files are missing), one is created. When the job starts on JuliaHub, this environment is instantiated.
A key feature of the appbundle is that development dependencies of the environment (i.e. packages added with pkg> develop
or Pkg.develop()
) are also bundled up into the archive that gets submitted to JuliaHub (including any current, uncommitted changes). Registered packages are installed via the package manager via the standard environment instantiation, and their source code is not included in the bundle directly.
When the JuliaHub job starts, the bundle is unpacked into the appbundle/
directory (relative to the starting working directory). E.g. if you have a mydata.dat
file in the bundled directory, you can access it in the script at joinpath("appbundle", "mydata.dat")
.
Finally, a .juliabundleignore
file can be used to exclude certain directories, by adding the relevant globs, similar to how .gitignore
files work. In addition, .git
directories are also automatically excluded from the bundle.
Examining job configuration
The dryrun
option to submit_job
can be used to inspect the full job workload configuration that would be submitted to JuliaHub.
JuliaHub.submit_job(
JuliaHub.script"""
println("hello world")
""",
ncpu = 4, memory = 8,
env = Dict("ARG" => "value"),
dryrun = true
)
JuliaHub.WorkloadConfig:
application:
JuliaHub.BatchJob:
code = """
println("hello world")
"""
sha256(project_toml) = ae99b7f4a9613201f5e4dabdb057f16cda7ea8e296b4c8742d7b83861b96a3b7
sha256(manifest_toml) = db72e0f2222ce42b8c42f49a6b911926d816ed43734a92cac27c6ad6233c5bea
compute:
JuliaHub.ComputeConfig
Node: 3.5 GHz Intel Xeon Platinum 8375C
- GPU: no
- vCores: 4
- Memory: 16 Gb
- Price: 0.33 $/hr
Process per node: true
Number of nodes: 1
timelimit = 1 hour,
env:
ARG: value
Query, extend, kill
The package has function that can be used to interact with running and past jobs. The jobs
function can be used to list jobs, returning an array of Job
objects.
julia> js = JuliaHub.jobs(limit=3)
3-element Vector{JuliaHub.Job}: JuliaHub.job("jr-eezd3arpcj") JuliaHub.job("jr-novcmdtiz6") JuliaHub.job("jr-3eka6z321p")
julia> js[1]
JuliaHub.Job: jr-eezd3arpcj (Completed) submitted: 2023-03-15T07:56:50.974+00:00 started: 2023-03-15T07:56:51.251+00:00 finished: 2023-03-15T07:56:59.000+00:00 files: - code.jl (input; 3 bytes) - code.jl (source; 3 bytes) - Project.toml (project; 244 bytes) - Manifest.toml (project; 9056 bytes) outputs: "{}"
If you know the name of the job, you can also query the job directly with job
.
julia> job = JuliaHub.job("jr-eezd3arpcj")
JuliaHub.Job: jr-eezd3arpcj (Completed) submitted: 2023-03-15T07:56:50.974+00:00 started: 2023-03-15T07:56:51.251+00:00 finished: 2023-03-15T07:56:59.000+00:00 files: - code.jl (input; 3 bytes) - code.jl (source; 3 bytes) - Project.toml (project; 244 bytes) - Manifest.toml (project; 9056 bytes) - outdir.tar.gz (result; 632143 bytes) outputs: "{\"result_variable\": 1234, \"another_result\": \"value\"}\n"
julia> job.status
"Completed"
julia> JuliaHub.isdone(job)
true
Similarly, the kill_job
function can be used to stop a running job, and the extend_job
function can be used to extend the job's time limit.
Waiting on jobs
A common pattern in a script is to submit one or more jobs, and then wait until the jobs complete, to then process their outputs. isdone
can be used to see if a job has completed.
julia> job = JuliaHub.job("jr-novcmdtiz6")
JuliaHub.Job: jr-novcmdtiz6 (Running) submitted: 2023-03-15T07:56:50.974+00:00 started: 2023-03-15T07:56:51.251+00:00 finished: 2023-03-15T07:56:59.000+00:00 files: - code.jl (input; 3 bytes) - code.jl (source; 3 bytes) - Project.toml (project; 244 bytes) - Manifest.toml (project; 9056 bytes) outputs: "{}"
julia> JuliaHub.isdone(job)
false
The wait_job
function also provides a convenient way for a script to wait for a job to finish.
julia> job = JuliaHub.wait_job("jr-novcmdtiz6")
JuliaHub.Job: jr-novcmdtiz6 (Completed) submitted: 2023-03-15T07:56:50.974+00:00 started: 2023-03-15T07:56:51.251+00:00 finished: 2023-03-15T07:56:59.000+00:00 files: - code.jl (input; 3 bytes) - code.jl (source; 3 bytes) - Project.toml (project; 244 bytes) - Manifest.toml (project; 9056 bytes) outputs: "{}"
julia> JuliaHub.isdone(job)
true
Accessing job outputs
There are two ways a JuliaHub job can store outputs that are directly related to a specific job[1]:
- Small, simple outputs can be stored by setting the
ENV["RESULTS"]
environment variable. Conventionally, this is often set to a JSON object, and will act as a dictionary of key value pairs. - Files or directories can be uploaded by setting the
ENV["RESULTS_FILE"]
to a local file path on the job. Note that directories are combined into a single tarball when uploaded.
The values set via the RESULTS
environment variable can be accessed with the .results
field of a Job
object:
julia> job.results
"{\"user_param\": 2, \"output_value\": 4}\n"
As the .results
string is often a JSON object, you can use the the JSON.jl or JSON3.jl packages to easily parse it. For example
julia> import JSON
julia> JSON.parse(job.results)
Dict{String, Any} with 2 entries: "user_param" => 2 "output_value" => 4
When it comes to job result files, they can all be accessed via the .files
field.
julia> job.files
4-element Vector{JuliaHub.JobFile}: JuliaHub.JobFile(:input, "code.jl", 3, ...) JuliaHub.JobFile(:source, "code.jl", 3, ...) JuliaHub.JobFile(:project, "Project.toml", 244, ...) JuliaHub.JobFile(:project, "Manifest.toml", 9056, ...)
The job_files
function can be used to filter down to specific file types.
julia> JuliaHub.job_files(job, :result)
JuliaHub.JobFile[]
And if you know the name of the file, you can also use the job_files
to get the specific JobFile
object for a particular file directly.
julia> jobfile = JuliaHub.job_file(job, :result, "outdir.tar.gz")
To actually fetch the contents of a file, you can use the download_job_file
function on the JobFile
objects.
- 1You can also e.g. upload datasets etc. But in that case the resulting data is not, strictly speaking, related to a specific job.