Skip to content

Glossary

This glossary defines APTrust and Preservation Terms, Registry Terms, and Bagging Terms.

APTrust and Preservation Terms

Term Definition
Ingest A set of operations performed by the APTrust cloud infrastructure that occur when a depositor uploads a bag. Includes a series of verifications and copying that results in the uploaded bag being stored in preservation buckets.
Updates A set of operations performed by the APTrust cloud infrastructure that occur when a depositor uploads a bag with the same name as an existing bag. These operations overwrite the current file in storage with the new version.
Restoration A set of operations performed by the APTrust cloud infrastructure that occur when a depositor requests to restore an object or file in the Registry API. These operations gather the object from preservation buckets and move it to the depositor’s restoration buckets.
Deletion A set of operations performed by the APTrust cloud infrastructure that occur when a depositor requests to delete an object or file in the Registry API. These operations begin with contacting admins at the depositor’s institution to verify requests and finishes by deleting the object or file from preservation.
Fixity Preserv checks fixity on all files in S3 and Wasabi storage every 90 days. We do not run fixity checks on files in Glacier or Glacier Deep archives.
Bag File structure format used by depositors to upload objects and files to the APTrust.
DART The Digital Archivist’s Resource Tool. A drag-and-drop tool used by depositors to package and upload files into remote repositories. It can also be used from the command-line on and headless servers. See also, Dart Docs: Getting Started.
Partner Tools A set of tools provided by APTrust for members to use, including DART, to perform actions and checks on their APTrust store. See also, Partner Tools
Minio An object storage solution that can be used to upload files to AWS receiving buckets.
Amazon CLI The AWS Command Line Interface (AWS CLI) is a unified tool to manage AWS services. With this tool one can control multiple AWS services from the command line and automate them through scripts.
Checksum A checksum is a digital fingerprint of a file created through a mathematical algorithm that represents its content in a unique way. It can be used to verify the integrity of a file by comparing the checksum of the original file with the checksum of a copy or download.
MD5 Message-Digest Algorithm 5. A widely used cryptographic hash function that generates a fixed-size, unique representation of an input message. It operates by manipulating blocks of the input message using mathematical operations until a final hash value is produced. The resulting hash value is commonly used to verify the integrity of digital content.
SHA-256 Secure Hash Algorithm 256-bit. A widely used cryptographic hash function that generates a fixed-size, unique representation of an input message. It operates by manipulating blocks of the input message using a series of complex mathematical operations until a final hash value is produced. The resulting hash value is commonly used for verifying the integrity and authenticity of digital content.
Receiving Bucket A receiving bucket is an Amazon AWS S3 bucket to which you upload materials for ingest into APTrust.

APTrust provides two receiving buckets for each depositor: one for the demo environment and one for the production environment. A receiving bucket is an Amazon AWS S3 bucket to which you upload materials for ingest into APTrust. Upload bags to your demo receiving bucket to ingest them into the demo system, and to the production bucket to ingest them into the production system.

Receiving bucket names follow this pattern:
  • Demo System: aptrust.receiving.test.<your-domain.xyx>
  • Production System: aptrust.receiving.<your-domain.xyz>

For example, if your domain name is example.org, your receiving buckets will be aptrust.receiving.test.example.org for demo and aptrust.receiving.example.org for production.
Restoration Bucket A restoration bucket is an Amazon AWS S3 bucket from which you download materials for restored from APTrust.

When you ask for APTrust to restore an object, our system reassembles all of the object's files into a BagIt bag and copies it to your receiving bucket. (Individual files can be restored the same way, though they are not bagged.) The restoration process may take anywhere from a few seconds to several hours, depending on the amount of data restored and where it is stored. Once the restoration is complete, you can download the restore object or files from your restoration bucket.
Preservation Bucket A preservation bucket is an Amazon AWS S3 bucket in which APTrust depositor objects and files are stored long-term.
Staging Bucket A preservation bucket is an Amazon AWS S3 bucket that acts as an intermediary stage between receiving and preservation buckets during the ingest process.
Wasabi Wasabi is a cloud-based storage solution designed for businesses of all sizes that need to store large amounts of data. It provides secure, high-performance, and cost-effective storage with no egress fees or data retrieval charges. Wasabi’s storage service is compatible with the S3 API, which makes it easy for customers to integrate their existing applications and workflows with the service.
API In computer programming, an application programming interface (API) is a set of subroutine definitions, protocols, and tools for building application software. In general terms, it is a set of clearly defined methods of communication between various software components.
API Key API key is a code passed in by computer programs calling an application programming interface (API) to identify the calling program API key API key.

An application programming interface key (API key) is a code passed in by computer programs calling an application programming interface (API) to identify the calling program, its developer, or its user to the Web site. API keys are used to track and control how the API is being used, for example to prevent malicious use or abuse of the API (as defined perhaps by terms of service).

Usage: The API key often acts as both a unique identifier & a secret token for authentication, and will generally have a set of access rights on the API associated with it.

API keys can be based on the universally unique identifier (UUID) system to ensure they will be unique to each user.
Bagging Date ISO 8601 UTC format Date (YYYY-MM-DD) that the bag content was prepared for delivery.

A field in the bagit-info.txt defined text file. Date (YYYY-MM-DD) that the content was prepared for delivery.
Payload-Oxum The “octetstream sum” of the payload, namely, a two-part number of the form “OctetCount.StreamCount”, where OctetCount is the total number of octets (8-bit bytes) across all payload file content and StreamCount is the total number of payload files. Payload-Oxum should be included in “bag-info.txt” if at all possible. Compared to Bag-Size (above), Payload-Oxum is intended for machine consumption.
ETag ETag The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata.
Infrastructure-as-code Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
Registry Registry is the searchable online registry that keeps track of everything you've deposited in APTrust. It includes a web UI and a REST API that enable you to do the following:
  • See what items you've deposited
  • Search for items by object name, file name, and a number of other attributes
  • View audit data (PREMIS events) for items you've deposited
  • Request that objects and/or files be restored
  • Request that objects and/or files be deleted (web UI only)
  • Query the status of ingest and restoration requests in progress
  • Add and remove user accounts for your institution
Registry includes both a demo and production system. The demo system is for depositors to test new workflows and to get familiar with the system's general features. The production system is for long-term preservation.
Pharos (DEPRECATED) Pharos is APTrusts web interface to manage deposits and inspect deposit outcomes. Pharos is deprecated and replaced by Registry

API Keys and Separate Systems

Once you have a valid Registry login, you can generate your own API key to use the REST API. Keep in mind that while your login email may be the same for both the demo and production Registry systems, your API keys will be different.

You Need AWS Access Keys to Use Your Buckets!

You will need to get AWS credentials from APTrust to access your receiving and restoration buckets. APTrust sends them as part of the onboarding process when you first set up your account. If you need credentials, contact help@aptrust.org.

Registry Terms

Term Definition
Intellectual Object A collection of generic files logically grouped into a single unit. An intellectual object typically consists of the payload files and tag files submitted in a bag.
Alternate Identifiers Alternate identifiers come from the Internal-Sender-Identifier tag in the bag-info.txt file. This usually identifies an object using your internal identifier scheme. This field is optional and may be blank.
Bag Group Identifiers The optional bag group identifier comes from the Bag-Group-Identifier field in the bag-info.txt file. This is used to logically group a number of distinct intellectual objects. For example, some organizations prefer to break large collections into a series of smaller bags, each of which becomes a distinct intellectual object upon ingest.
Generic Files A single file or bitstream that makes up part of an intellectual object. For example, in a collection of jpeg photos that includes an XML metadata file, each jpeg and the XML file is a generic file.
Premis Events PREMIS stands for Preservation Metadata: Implementation Strategies, and it refers to a standard for metadata that is used for digital preservation. A PREMIS Event is an occurrence that involves an action or series of actions that affect a digital object and that can be documented for preservation purposes.
Object-Level Events Ingestion: The ingest process for the object completed. This means the object record was created and all of the object’s files were copied to preservation storage.

Creation: The object record was created.

Access Assignment: Obsolete. The object was assigned an access setting of Consortia, Institution, or Restricted. See access values for definitions.

Identifier Assignment: The object was assigned an identifier. APTrust object identifiers use the pattern <institution.domain>/<object_name>, where object_name is the name of the tarred bag, minus the .tar extension. For example, test.edu/bag_of_photos.

Deletion: The object was deleted. This means that all of its component files were deleted. Registry keeps a record of the object and all of its files after deletion, though it does not retain the files themselves.
File-Level Events Ingestion: The file has been copied to long-term preservation storage and its metadata has been saved in Registry.

Identifier Assignment: The file has been assigned an APTrust identifier, in the form <object_identifier>/<file_relative_path>, where file_relative_path is the location of the file in the bag in which it was submitted. For example, if test.edu submits a bag called bag_of_photos, the payload file data/photo1.jpg would have the identifier test.edu/bag_of_photos/data/photo1.jpg.

Message Digest Calculation: The APTrust ingest process has calculated a message digest for the file. On ingest, there should be two of these events for each file, one with an md5 digest and one with a sha256 digest.

Replication: The file has been copied to replication storage. This applies only to files using the Stadard storage option, in which the primary copy exists in S3 in Northern Virginia and the secondard (replicated) copy is in Glacier in Oregon. See storage options for more information.

Fixity Check: This records the outcome of a fixity check on this file. APTrust perform fixity checks on items in Standard storage every 90 days, but does not perform fixity checks on items in Glacier-Only or Glacier Deep Archive storage. See storage options for more information. Also note that APTrust checks the sha256 fixity only, even though we also know the md5 fixity value.

Deletion: Records when a file was deleted and at whose request.
Work Items Work items are tasks that APTrust systems perform in response to depositor requests.
Types of Work Items Delete: Deleting files or objects from preservation storage.

Glacier Restore: This is the first step in restoring objects and files from Glacier.

Ingest: Getting new or updated objects into the system.

Restore File: Restoring an individual file to a depositor’s restoration bucket.

Restore Object: Restoring an intellectual object to a depositor’s restoration bucket.
Stages of Work Items Available in S3: A restored file or object is available in the depositor’s S3 restoration bucket.

Cleanup: The task has completed and the system has cleaned up temporary files. This applies only to ingest and restoration actions. This stage is more meaningful to APTrust internal operations than to depositors.

Copy to Staging: Files are being copied to a staging bucket as part of the ingest process.

Format Identification: Files being ingested are undergoing format identification against a PRONOM database.

Fetch: The system is retrieving a bag from the receiving bucket for ingest, or is retrieving files from preservation storage for restortation.

Package: An intellectual object’s files are being repackaged into a new bag for restoration.

Receive: APTrust has noticed a new bag in a receiving bucket, but has not yet begun to process it for ingest.

Record: The system has finished copying a bag’s files to preservation storage and is now recording metadata in Registry.

Reingest Check: Files being ingested are checked against known files in the Registry to see if they’ve been ingested before. The system will re-ingest files only if they’ve changed since the last ingest.

Requested: A depositor has requested a restoration or deletion, but the system has not yet begun to process it.

Resolve: A task has completed. See Statuses below for the outcome.

Restoring: Files are in process of being restored to a depositor’s restoration bucket.

Storage Validation: The ingest process is verifying that files copied into preservation storage were copied successfully.

Store: Files are being copied to long-term preservation. This applies only to ingest.

Unpack: Obsolete. This stage remains because it was used in some early ingests between 2015 and 2016.

Validate The system is validating a bag before ingest, or it’s validating a bag it has just assembled for restoration.
Statuses of Work Items Cancelled: The task was cancelled by an APTrust administrator. You find an explanation of the cancellation by clicking on the item and reading the Note/Error field.

Failed: The task failed. Virtually all failures result from one of the following causes:
  • A depositor submitted an invalid bag for ingest. This is a permanent error and the system will not retry the task until the depositor submits a new version of the bag.
  • The system has run into a temporary problem such as lack of disk space or poor network connection. The system will automatically retry the task. If the task fails repeatedly, an APTrust admin will try to solve the underlying problem and then tell the system to retry the task again.
Pending: The system is waiting for available resources to start the next stage of the task. See Stages above.

Started: The system has started the current Stage of the task and is still working on it. The Fetch, Validate, Store, Record, and Package stages can take several hours for very large bags.

Success: The task completed successfully.

Suspended: The task was suspended due to a potential conflict. These items are flagged for administrator review, and an APTrust administrator will determine whether to resume or cancel the item.
Multi-Factor Authentication Options Authy: Obsolete. Service that sends push notifications to a user device to authenticate.

SMS: Sends SMS text to user device in order to authenticate.

Backup Code: Emergency option, codes generated when login that depositors can use at most once in order to authenticate
Restoration Spot Test Restoration spot tests periodically restore a single, random intellectual object to your institution’s restoration bucket.
Reports The Registry includes two reports to show deposit statistics, deposits by institution and deposits over time.

Bagging Terms

Term Definition
BagIt profile A set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A “bag” has just enough structure to enclose descriptive metadata “tags” and a file “payload” but does not require knowledge of the payload’s internal semantics.
BTR BagIt profile A Bagit Profile crearted by the Beyond the Repository research project.
Bag Access Values Obsolete.

Restricted: Metadata about this object is accessible to the institutional administrator (at the depositing institution) and to the APTrust admin. No one else can even see that this object exists in the repository.

Institution: All users at the depositing institution can see metadata about this object.

Consortia: All APTrust members can see this object’s metadata.
Bag Allowed Storage-Option Values: Standard Also called High Assurance, the bag’s contents will be stored in S3 in Northern Virginia and Glacier in Oregon. APTrust will perform fixity checks on the S3 files every 90 days.

Glacier-OH: Files will be stored ONLY in Glacier, in AWS’s Ohio region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.

Glacier-OR: Files will be stored ONLY in Glacier, in AWS’s Oregon region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.

Glacier-VA: Files will be stored ONLY in Glacier, in AWS’s Northern Virginia region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.

Glacier-Deep-OH: Files will be stored ONLY in Glacier Deep Archive, in AWS’s Ohio region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.

Glacier-Deep-OR: Files will be stored ONLY in Glacier Deep Archive, in AWS’s Oregon region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.

Glacier-Deep-VA: Files will be stored ONLY in Glacier Deep Archive, in AWS’s Northern Virginia region, and will be encrypted during storage. APTrust will not perform any fixity checks on these files.
Multipart Bags (deprecated) Older version of separating bags: You may split a single large bag into a number of smaller bags by using the naming convention institution_identifier.bag_identifier.b###.of###. That is, you append .bag01.of16, .bag02.of16, etc. to the end of the bag name of each bag in the group.
Multipart Bags (current) Current state of separating bags: Please use the Bag-Group-Identifier tag in the bag-info.txt file to indicate that multiple bags are part of the same group. Bag-Group-Identifier is part of the BagIt standard described in RFC 8493.
Python BagIt User-defined tool for bagging files. References for implementations are the Library of Congress, University of Miami, and North Carolina State University.