Datasets
JuliaHub.jl offers a programmatic way to work with your JuliaHub datasets, and this section demonstrates a few common workflows you can use with these APIs.
See the datasets reference page for a detailed reference of the datasets-related functionality.
Accessing datasets
The datasets
function can be use to list all the datasets owned by the currently authenticated user, returning an array of Dataset
objects.
julia> JuliaHub.datasets()
2-element Vector{JuliaHub.Dataset}:
JuliaHub.dataset(("username", "example-dataset"))
JuliaHub.dataset(("username", "blobtree/example"))
If you know the name of the dataset, you can also directly access it with the dataset
function, and you can access the dataset metadata via the properties of the Dataset
object.
julia> ds = JuliaHub.dataset("example-dataset")
Dataset: example-dataset (Blob)
owner: username
description: An example dataset
size: 57 bytes
tags: tag1, tag2
julia> ds.owner
"username"
julia> ds.description
"An example dataset"
julia> ds.size
57
If you want to work with dataset that you do not own but is shared with you in JuliaHub, you can pass shared=true
to datasets
, or specify the username.
julia> JuliaHub.datasets(shared=true)
3-element Vector{JuliaHub.Dataset}:
JuliaHub.dataset(("username", "example-dataset"))
JuliaHub.dataset(("anotheruser", "publicdataset"))
JuliaHub.dataset(("username", "blobtree/example"))
julia> JuliaHub.datasets("anotheruser")
1-element Vector{JuliaHub.Dataset}:
JuliaHub.dataset(("anotheruser", "publicdataset"))
julia> JuliaHub.dataset(("anotheruser", "publicdataset"))
Dataset: publicdataset (Blob)
owner: anotheruser
description: An example dataset
size: 57 bytes
tags: tag1, tag2
Finally, JuliaHub.jl can also be used to download to your local machine with the download_dataset
function.
julia> JuliaHub.download_dataset("example-dataset", "mydata")
Transferred: 86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 2.1s
"/home/username/my-project/mydata"
In JuliaHub jobs and Cloud IDEs you can also use the DataSets.jl package to access and work with datasets. See the help.julialang.org section on datasets for more information.
Create, update, or replace
The upload_dataset
function can be used to programmatically create new datasets on JuliaHub.
julia> JuliaHub.upload_dataset("example-dataset", "local-file")
Transferred: 86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA - Transferred: 1 / 1, 100% Elapsed time: 2.1s Dataset: example-dataset (Blob) owner: username description: An example dataset size: 57 bytes tags: tag1, tag2
The type of the dataset (Blob
or BlobTree
) depends on whether the uploaded object is a file or a directory. A directory will be store as a BlobTree
-type dataset on JuliaHub.
julia> JuliaHub.upload_dataset("example-blobtree", "local-directory")
Transferred: 86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA - Transferred: 1 / 1, 100% Elapsed time: 2.1s Dataset: example-blobtree (BlobTree) owner: username description: An example dataset size: 57 bytes tags: tag1, tag2
The create
, update
, and replace
options control how upload_dataset
behaves with respect to existing datasets. By default, the function only creates brand new datasets, and trying to upload a dataset that already exists will fail with an error.
julia> JuliaHub.upload_dataset("example-dataset", "local-file")
ERROR: InvalidRequestError: Dataset 'example-dataset' for user 'username' already exists, but update=false and replace=false.
This behavior can be overridden by setting update=true
, which will then upload a new version of a dataset if it already exists. This is useful for jobs and workflows that are meant to be re-run, updating the dataset each time they run.
julia> JuliaHub.upload_dataset("example-dataset", "local-file"; update=true)
Transferred: 86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA - Transferred: 1 / 1, 100% Elapsed time: 2.1s Dataset: example-dataset (Blob) owner: username description: An example dataset size: 57 bytes tags: tag1, tag2
The replace=true
option can be used to erase earlier versions of a dataset. This will delete all information about the existing dataset and is a destructive, non-recoverable action. This may also lead to the dataset type being changed.
julia> JuliaHub.upload_dataset("example-dataset", "local-file"; replace=true)
Transferred: 86.767 KiB / 86.767 KiB, 100%, 0 B/s, ETA - Transferred: 1 / 1, 100% Elapsed time: 2.1s Dataset: example-dataset (Blob) owner: username description: An example dataset size: 57 bytes tags: tag1, tag2
Bulk updates
You can also use the package to perform bulk updates or deletions of datasets. The following example, adds a new tag to all the datasets where the name matches a particular pattern.
# Find all the datasets that have names that start with 'my-analysis-'
myanalysis_datasets = filter(
dataset -> startswith(dataset.name, r"my-analysis-.*"),
JuliaHub.datasets()
)
# .. and now add a 'new-tag' tag to each of them
for dataset in myanalysis_datasets
@info "Updating" dataset
# Note: tags = ... overrides the whole list, so you need to manually retain
# old tags.
new_tags = [dataset.tags..., "new-tag"]
JuliaHub.update_dataset(dataset, tags = new_tags)
end
While this example shows the update_dataset
, for example, the delete_dataset
function could be used in the same way.