Skip to content

Ingest

Notice

Before sending materials to APTrust for ingest, you'll need to get AWS keys that allow you to upload materials to your receiving bucket. If you don't already have these, contact help@aptrust.org to get them. Also keep in mind that you'll have separate AWS keys for the demo and production environments.

You'll also need to know how to produce a valid APTrust bag. If you don't know how to do that yet, see the bagging page for details, or use DART to get going quickly.

Uploading for Ingest

Assuming you have a valid bag, you can send it to our production system by uploading it to aptrust.receiving.test.<institution.domain> for the demo system or aptrust.receiving.<institution.domain> for the production system. In each case, replace <institution.domain> with your institution's domain name.

The following tools can upload files to your receiving bucket:

If you plan on interacting frequently with S3, the Minio Client provides the best combination of rich features, ease of installation, and ease of use.

For most of what you'll be doing with APTrust, DART and the APTrust Partner Tools should be sufficient.

The Ingest Process

Warning

There can be a delay of up to 15 minutes before the tarred bag shows up in the work item list.

After you upload tarred bag to your receiving bucket, APTrust's ingest process will add it to a list of items waiting to be processed. You can check the status of your bag in the list of Registry Work Items, using the REST API, or using the apt_check_ingest program from the partner tools. Once your bag is successfully ingested it is automatically deleted from your receiving bucket. If the ingest fails you can see details in Registry.

Notice

Failed bags stay in your receiving bucket for 30 or 60 days (demo or production) for your review. After that period the bag is automatically deleted.

Ingest process on the backend

DART's dashboard also shows the status of items recently ingested and pending ingest.

Smaller bags (those under about 5GB) tend to ingest quickly. Larger bags can take longer, with multi-terabyte bags sometimes taking a few days. This is because the ingest process calculates checksums on every byte of data in the bag and then typically copies each of the bag's files to two distinct regions of the country.

Ingest Restrictions

  • Materials must be sent in tarred bags.
  • Bag names, and the names of files within bags, may not include control characters (such as backspace, delete, etc.)
  • Maximum bag size on our demo system is 5 GB
  • Maximum bag size on our production system is 5 TB

You can get around the 5TB bag size limit by using bag groups and the Bag-Group-Identifier tag. See the bagging page for more info.

Reingesting Existing Bags

You can re-upload a bag any time you like to your receiving bucket, but be sure to read the page on updates so you understand how APTrust processes bag updates.