Skip to content

DART Runner

Because DART uses the Electron framework, it requires the presence of a graphical user interface and a windowing system, even when it's not going to use a GUI. That means it can't even start in comman-line mode unless it's running in a desktop environment. This limitation is inherent in Electron and makes DART unsuitable for running on a Linux server.

For this reason, APTrust built dart-runner, which is a lightweight command-line version of DART that can run in server environments without a GUI. Dart-runner is intended to run workflows that were created and tested in DART.

The general process is:

  1. Create a Workflow in DART and test it out locally to ensure it does what you want.
  2. Export the workflow as described on the Workflows page.
  3. Run dart-runner on the server with the exported workflow file and a list of items you want to run through that workflow. If you're running a batch job, the list must conform to the workflow CSV format used for batch jobs. If you're running one-off jobs using Job Params, be sure your JSON conforms to the format below. See below for details.

Note

When scripting jobs and workflows on Mac and Windows, you should stick with the DART CLI, since it's more mature as of late 2021. Since Windows and Mac include a GUI environment by default, DART and the DART CLI will always work.

Downloads

Latest version is v0.95-beta, released August 11, 2022.

Download the 0.95 beta version of dart-runner for Linux.

There's also a Mac version of the beta if you want to experiment, but for now, APTrust suggests using the DART CLI on Mac.

Because it's a single binary with no dependencies, there's no installation process for dart-runner. Simply copy the binary onto your computer and run.

If you're interested, the souce code is available on GitHub at https://github.com/APTrust/dart-runner.

Features

Dart-runner:

  • is a single, lightweight binary, less than 10MB in size, with no external dependencies. To install, just copy the binary to your server.
  • is much lighter on CPU and memory than DART.
  • can run huge workflows unattended.
  • can run multiple jobs in parallel.
  • can be scripted from other languages like the DART CLI, though its syntax differs somewhat from the CLI.
  • outputs clean, machine-readable JSON describing the outcome of every job. The output format is JSON Lines, with each line describing the outcome of a single job in the workflow.
  • provides meaningful error messages when errors occur.
  • supports piping, so you can send its output to any file you want for later analysis.
  • provides meaningful return codes, so your script knows whether a job succeeded or failed.

Limitations

Dart-runner is currently in beta and has the following limitations:

  • It supports only the BagIt packaging format.
  • It supports only S3 uploads (no SFTP).
  • It's intended primarily for use on Linux servers.

Differences in Job Params JSON

The Job Params JSON format for dart-runner differs slightly from the JSON used in the DART CLI. Specifically:

  • tags use "value" instead of "userValue" and
  • you don't need to specify the workflow name or ID in the json, because it's specified as a file name on the command line

The help text below shows an example of valid Job Params JSON for dart-runner.

Usage

Options

  --workflow     Path to workflow json file. Use this option if you are running
                 a workflow against a batch of files. If you specify a workflow
                 file, you must also specify --batch. Workflows can be exported
                 from the DART UI.

  --batch        Path to CSV batch file. Use this option with --workflow to
                 specify a set of files or directories to run through a
                 workflow. The batch file format is described at
                 https://aptrust.github.io/dart-docs/users/workflows/batch_jobs/

  --output-dir   Path to package output directory. Jobs and workflows will
                 create bags in this directory. This option is always REQUIRED.

  --delete       Delete bags after job completes? Set this to true or false.
                 The default is true for jobs and workflows that include
                 uploads: the bags will be deleted after successful uploads.
                 Default is false for jobs and workflows that do not include
                 uploads because you probably want to do something with the bag
                 after it's created.

  --concurrency  Number of jobs to run concurrently. Default is 1. Max value
                 for this param should be less than or equal to the number of
                 processors on your machine. You may get diminishing returns
                 when setting this above 2 because most of the DART runner's
                 work is reading from and writing to disk. This setting only
                 makes sense for workflows. For a single job, you can omit
                 this.

  --help         Show this help document.

Examples

If you're running a single job, you can send job params to dart-runner through STDIN, like this:

echo '{ json }' | dart-runner --workflow=workflow.json --output-dir=/dir

The job params json tells dart-runner which files to bag and what tag values to set. The workflow tells dart-runner which BagIt profile to use and where to send the bag.

You can also feed the contents of a Job Params file to STDIN, as in either example below:

dart-runner --workflow=workflow.json --output-dir=/dir < job_params.json

cat job_params.json | dart-runner --workflow=workflow.json --output-dir=/dir

This runs the job described in the job_params.json file, writing the bag to the specified output directory.

To run a workflow:

dart-runner --workflow=path/to/workflow.json  \
            --batch=path/to/batch.csv         \
            --output-dir=path/to/directory    \
            --concurrency=2                   \
            --delete=false

The command above runs all of the items listed in the --batch CSV file through the workflow described in the --workflow json file. Bags are written to the output directory. Setting the delete flag to false means the bags will not be deleted from the output directory after successful upload.

The --concurrency flag above tells DART runner to work on 2 bags at a time (instead of the default 1 at a time) when bagging and uploading.

Setting --delete to true (or omitting --delete) will cause bags to be deleted after successful upload.

Exit Codes

    0 - Normal exit. This means there were no errors and all tasks
        succeeded.

    1 - Runtime error. This means dart runner was able to start the
        job, but encountered one or more errors along the way.

        In a single job, this usually means at least one step of the job
        failed: bagging, validation, or upload. Check the JSON output
        for more details about where the failure occurred and what
        happened.

        When running a batch of jobs through a workflow, this exit code
        means that one or more of the jobs in the batch failed. In this
        case, you should find an error message on stderr saying something
        like "2 Job(s) failed".

    2 - Usage error. This means dart runner didn't even attempt to start
        the job because something was wrong with the parameters. This
        usually means you've forgotten to provide a necessary parameter
        such as --workflow, or that the parameter points to a non-existant
        or unreadable file. This error also occurs when a parameter contains
        invalid or unparsable data (bad JSON or bad CSV format).

        You should see a message on stderr describing the problem.

Sample Job Params JSON

The following job params tell dart-runner to bag all of the files in /home/linus/documents and /home/linus/files. This also tells the bagger to write two tags with the assigned values into the bag-info.txt file and one tag into the aptrust-info.txt file.

These job params would be combined with a workflow JSON file that would tell dart runner which BagIt profile to use when creating the bag, and where to send the bag when it's done.

{
    "packageName": "TestBag.tar",
    "files": [
        "/home/linus/documents",
        "/home/linus/photos"
    ],
    "tags": [
        {
            "tagFile": "bag-info.txt",
            "tagName": "Tag-One",
            "value":   "Value One"
        },
        {
            "tagFile": "bag-info.txt",
            "tagName": "Tag-Two",
            "value":   "Value Two"
        },
        {
            "tagFile": "aptrust-info.txt",
            "tagName": "Tag-Three",
            "value": "Value Three"
        }
    ]
}

Output Format

For each completed job, DART Runner prints one line of JSON to stdout (standard output, which is usually a terminal). This means that when running a single job, you'll get one line of output. When running a batch job, you'll get one line for each entry in the CSV batch file.

DART Runner also prints summary messages to stderr (standard error) when errors occur, though the JSON output on stdout will have more details about what actually went wrong.

Output from a successful job looks like this:

{
    "jobName": "TestBag.tar",
    "payloadByteCount": 170742,
    "payloadFileCount": 55,
    "succeeded": true,
    "packageResult": {
        "attempt": 1,
        "completed": "2021-12-14T14:50:50.684655-05:00",
        "errors": {},
        "fileMtime": "2021-12-14T14:50:50.684637052-05:00",
        "filepath": "/home/linustmp/bags/TestBag.tar",
        "filesize": 224256,
        "info": "",
        "operation": "package",
        "provider": "Bagger - DART Runner v0.91-beta-1-g70b06da for Darwin x86_64 (Build 70b06da 2021-12-14)",
        "remoteChecksum": "",
        "remoteURL": "",
        "started": "2021-12-14T14:50:50.66122-05:00",
        "warning": ""
    },
    "validationResult": {
        "attempt": 1,
        "completed": "2021-12-14T14:50:50.684665-05:00",
        "errors": {},
        "fileMtime": "2021-12-14T14:50:50.684637052-05:00",
        "filepath": "/home/linustmp/bags/TestBag.tar",
        "filesize": 224256,
        "info": "",
        "operation": "validation",
        "provider": "Validator - DART Runner v0.91-beta-1-g70b06da for Darwin x86_64 (Build 70b06da 2021-12-14)",
        "remoteChecksum": "",
        "remoteURL": "",
        "started": "2021-12-14T14:50:50.684656-05:00",
        "warning": ""
    },
    "uploadResults": [{
        "attempt": 1,
        "completed": "2021-12-14T14:50:50.733585-05:00",
        "errors": {},
        "fileMtime": "2021-12-14T14:50:50.684637052-05:00",
        "filepath": "/home/linustmp/bags/TestBag.tar",
        "filesize": 224256,
        "info": "Output file at /home/linustmp/bags/TestBag.tar was deleted at 2021-12-14T14:50:50-05:00",
        "operation": "upload",
        "provider": "Uploader - DART Runner v0.91-beta-1-g70b06da for Darwin x86_64 (Build 70b06da 2021-12-14)",
        "remoteChecksum": "09f4a6d159e07cd11c82ead5d2a3e95c-1",
        "remoteURL": "s3://localhost:9899/dart-runner.test/TestBag.tar",
        "started": "2021-12-14T14:50:50.684666-05:00",
        "warning": ""
    }],
    "validationErrors": null
}

The most important elements in this output are:

  succeeded       - true or false, indicating whether all steps of the job
                    succeeded

  errors          - Within each result, this will contain a set of name-value
                    pairs indicating what went wrong. This element will be
                    empty if the operation succeeded.

  remoteURL       - The URL to which the package was uploaded. (Applies
                    only to uploadResults.)

  remoteChecksum  - The ETag returned by the remote S3 server after a
                    successful upload. (Applies only to uploadResults for
                    S3 uploads.)

Scripting

Most scripting languages provide ways of capturing the stdout and stderr output of external programs called from within the script. Most languages also let you capture the external process' return code.

When scriping DART Runner, you should generally do the following:

  1. Capture the exit code. If it's not zero, log a message or perform some other kind of error handling.
  2. Redirect stdout to a file if you want to save all the JSON instead of capturing it.

You can redirect stdout to a file like this:

dart-runner [args] > output_log.json

Additional Resources