Getting Started with JuliaHub.jl

This tutorial walks you through the basic operations you can do with JuliaHub.jl, from installation to submitting simple jobs and working with datasets. If you are unfamiliar with JuliaHub.jl, this is a good place to get started.

If you already know what you wish to achieve with JuliaHub.jl, you can also skip this and jump directly into one of the more detailed how-to guides.

In particular, the tutorial will show

  • How to install JuliaHub.jl and connect it to a JuliaHub instance.
  • How to create, access and update a simple dataset.
  • How to submit a simple job.

Installation

JuliaHub.jl is a registered Julia package and can be installed using Julia's package manager. You can access the Julia package manager REPL mode by pressing ], and you can install JuliaHub.jl with

pkg> add JuliaHub

Alternatively, you can use the Pkg standard library functions to install it.

import Pkg
Pkg.add("JuliaHub")

Once it is installed, simply use import or using to load JuliaHub.jl into your current Julia session.

julia> using JuliaHub
No exported names

JuliaHub.jl does not have any exported names, so doing using JuliaHub does not introduce any functions or types in Main. Instead, JuliaHub.jl functions are designed to be used by prefixing them with JuliaHub. (e.g. JuliaHub.authenticate(...) or JuliaHub.submit_job(...))

That said, there is nothing stopping you from explicitly bringing some names into your current scope, by doing e.g. using JuliaHub: submit_job, if you so wish!

Authentication

In order to communicate with a JuliaHub instance, you need a valid authentication token. If you are working in a JuliaHub Cloud IDE, you actually do not need to do anything to be authenticated, as the authentication tokens are automatically set up in the cloud environment. To verify this, you can still call authenticate, which should load the pre-configured token.

julia> JuliaHub.authenticate()JuliaHub.Authentication("https://juliahub.com", "username", *****)

If you are working on a local computer, the easiest way to get started is to pass the URL of the JuliaHub instance to authenticate. Unless you have authenticated before, this will initiate an interactive browser-based authentication.

julia> JuliaHub.authenticate("juliahub.com")
Authentication required: please authenticate in browser.
The authentication page should open in your browser automatically, but you may need to switch to the opened window or tab. If the authentication page is not automatically opened, you can authenticate by manually opening the following URL: ...

Once you have completed the steps in the browser, the function should return a valid authentication token.

The authenticate function returns an Authentication object, which hold the authentication token. In principle, you can pass these objects directly to JuliaHub.jl function via the auth keyword argument. However, in practice, this is usually not needed, because JuliaHub.jl also remembers the last authentication in the Julia session in a global variable. You can see the current globally stored authentication token with current_authentication.

julia> JuliaHub.current_authentication()JuliaHub.Authentication("https://juliahub.com", "username", *****)
Authentication guide

There is more to authentication than this, including its relationship to the Julia package server and JULIA_PKG_SERVER environment variable. See the Authentication how-to if you want to learn more.

Creating & accessing datasets

JuliaHub.jl allows you to create, access, and update the datasets that are hosted on JuliaHub. This section shows some of the basic operations you can perform with datasets.

The datasets function allows you to list the datasets you have. Optionally, you can also make it show any other datasets you have access to.

julia> JuliaHub.datasets()JuliaHub.Dataset[]

Unless you have created datasets in the web UI or in the IDE, this list will likely be empty currently. To fix that, let us upload a simple dataset using JuliaHub.jl.

Just as an example, we'll generate a simple 5-by-5 matrix, and save it in a file using the using the DelimitedFiles standard library.

julia> using DelimitedFiles
julia> mat = [i^2 + j^2 for i=1:5, j=1:5]5×5 Matrix{Int64}: 2 5 10 17 26 5 8 13 20 29 10 13 18 25 34 17 20 25 32 41 26 29 34 41 50
julia> writedlm("matrix.dat", mat)

Now that the matrix has been serialized into a text file on the disk, we can upload that file to JuliaHub with upload_dataset.

julia> JuliaHub.upload_dataset("tutorial-matrix", "matrix.dat")Transferred:       86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.1s
Dataset: tutorial-matrix (Blob)
 owner: username
 description:
 versions: 1
 size: 57 bytes
Existing dataset

If you already happen to have a dataset with the same name, the upload_dataset call will fail. It is designed to be safe by default. However, you can pass update=true or replace=true to either upload your file as a new version of the dataset, or to delete all existing versions and upload a brand new version.

If we now call datasets, it should show up in the list of datasets.

julia> JuliaHub.datasets()1-element Vector{JuliaHub.Dataset}:
 JuliaHub.dataset(("username", "tutorial-matrix"))

To see more details about the dataset, you can index into the array returned by datasets. Alternatively, you can also use the dataset function to pick out a single dataset by its name.

julia> JuliaHub.dataset("tutorial-matrix")Dataset: tutorial-matrix (Blob)
 owner: username
 description:
 versions: 1
 size: 57 bytes

JuliaHub datasets also support basic metadata, such as tags and a description field. You could set it directly in the upload_dataset function, but we did not. But that is fine, since we can use update_dataset to update the metadata at any time.

julia> JuliaHub.update_dataset("tutorial-matrix", description="An i^2 + j^2 matrix")Dataset: tutorial-matrix (Blob)
 owner: username
 description: An i^2 + j^2 matrix
 versions: 1
 size: 57 bytes

The function also immediately queries JuliaHub for the updated dataset metadata by internally calling JuliaHub.dataset("tutorial-matrix").

Finally, JuliaHub.jl also allows you to download the datasets you have with the download_dataset function. We can also imagine doing this on a different computer or in a JuliaHub job.

julia> JuliaHub.download_dataset("tutorial-matrix", "matrix-downloaded.dat")Transferred:       86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.1s
"/home/runner/work/JuliaHub.jl/JuliaHub.jl/docs/build/matrix-downloaded.dat"

This downloads the dataset into a local file, after which you can e.g. read it back into Julia and do operations on it.

julia> mat = readdlm("matrix-downloaded.dat", '\t', Int)5×5 Matrix{Int64}:
  2   5  10  17  26
  5   8  13  20  29
 10  13  18  25  34
 17  20  25  32  41
 26  29  34  41  50
julia> sum(mat)550
Directories as datasets

While this demo uploaded a single file as a dataset, JuliaHub also supports uploading whole directories as a single dataset. For that, you can simply point upload_dataset to a directory, rather than a file. See the datasets how-to for more information on how to work with datasets.

Submitting a job

JuliaHub.jl allows for an easy programmatic submission of JuliaHub jobs. In this example, we submit a simple script that downloads the dataset from the previous step, does a simple calculations and then upload the result. We then access the result locally with JuliaHub.jl.

First, we need to specify the code that we want to run in the job. There are a few options for this, but in this example we use the @script_str string macro to construct a script-type computation, that simply runs the code snippet we specify.

The following script will access the dataset, calculates the sum of all the elements, and stores the value in the job results. You will be able to access the contents of RESULTS in both the web UI, but also via JuliaHub.jl.

s = JuliaHub.script"""
using JuliaHub, DelimitedFiles
@info JuliaHub.authenticate()
JuliaHub.download_dataset("tutorial-matrix", "matrix-downloaded.dat")
mat = readdlm("matrix-downloaded.dat", '\t', Int)
mat_sum = @show sum(mat)
ENV["RESULTS"] = string(mat_sum)
"""
JuliaHub.BatchJob:
code = """
using JuliaHub, DelimitedFiles
@info JuliaHub.authenticate()
JuliaHub.download_dataset("tutorial-matrix", "matrix-downloaded.dat")
mat = readdlm("matrix-downloaded.dat", '\t', Int)
mat_sum = @show sum(mat)
ENV["RESULTS"] = string(mat_sum)
"""
sha256(project_toml) = 62aca0c4b58726ab88c7beaa448e4ca3d51ba68c2d4f9c244b22e09dfe2919d1
sha256(manifest_toml) = 8a45e28aaeac067142b495ff5d7037cb795afe1c4ef7277c25359f2c45b73a1d
Job environment

In most cases, you also submit a Julia package environment (i.e. Project.toml and Manifest.toml files together with a job). That environment then gets instantiated before the user-provided code is run.

The script"" string macro, by default, attaches the currently active environment to the job. This means that any packages that you are currently using should also be available on the job (although only registered packages added as non-development dependencies will work). You can use Base.active_project() or pkg> status to see what environment is currently active.

To submit a job, you can simply call submit_job on it.

julia> j = JuliaHub.submit_job(s)JuliaHub.Job: jr-xf4tslavut (Submitted)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00

The submit_job function also allows you to specify configure how the job gets run, such as how many CPUs or how much memory it has available. By default, though, it runs your code on a single node, picking the smallest instance that is available.

At this point, if you go to the "Jobs" page web UI, you should see the job there. It may take a few moments to actually start running. You can also call job on the returned Job object to refresh the status of the job.

julia> j = JuliaHub.job(j)JuliaHub.Job: jr-xf4tslavut (Running)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00

Finally, after the job has completed, if you refresh the Job it should reflect the final status of the job, and also give you access to the

julia> j = JuliaHub.job(j)JuliaHub.Job: jr-xf4tslavut (Completed)
 submitted: 2023-03-15T07:56:50.974+00:00
 started:   2023-03-15T07:56:51.251+00:00
 finished:  2023-03-15T07:56:59.000+00:00
 outputs: "550"
julia> j.results"550"

See the jobs how-to guide for more details on the different options when it comes to job submission.

Next steps

This tutorial has hopefully given an overview of basic JuliaHub.jl usage. For more advanced usage, you may want to read through the more detailed how-to guides.