Validator

Validator

Validator validates BagIt packages (tarred or in directory format) according to a BagIt profile.

See the validate() method for a list of events.

Constructor

new Validator(pathToBag, profile)

Constructs a new BagIt validator.

Parameters:
Name Type Description
pathToBag string

is the absolute path the the bag, whether it's a directory or a tar file.

profile BagItProfile

is the BagItProfile that describes what consititutes a valid bag.

Source:

Members

_filesChecked :number

This is a private internal variable that keeps track of the number of files whose checksums we have compared against checksums in the manifest.

Default Value:
  • 0
Source:

_hashesInProgress :number

This is a private internal variable that keeps track of the number of checksum digests currently being calculated. This is part of a lovely hack.

Default Value:
  • 0
Source:

_initialFileCount :number

This is a private internal variable that registers a preliminary count of files found in a bag. This count includes tag files and manifests as well as payload files.

Default Value:
  • 0
Source:

bagName :BagItProfile

bagName is the calculated name of the bag, which will be either the name of the directory that contains the bag files, or the name of the tar file, minus the .tar extension. You can override this by setting it explicitly.

Source:

bagRoot :string

bagRoot is the name of the top-level folder to which a tarred bag untars. The folder name should match the bag name.

E.g. "bag123.tar" should untar to bagRoot "bag123"

For non-tarred bags, this property will be null.

Source:

disableSerializationCheck :boolean

When set to true, this flag tells the validator not to validate the bag serialization format. You'll want to disable this in cases where you're trying to validate an unserialized (i.e. not tarred or zipped or otherwise packaged) bag from a directory against a profile that says the bag must be tarreed, zipped, etc. Because sometimes you build a bag and you want to validate it before you zip it or after you untar it.

Default Value:
  • false
Source:

errors :Array.<string>

errors is a list of error messages describing problems encountered while trying to validate the bag or specific violations of the BagItProfile that render this bag invalid.

Source:

files :object.<string, BagItFile>

files is a hash of BagItFiles, where the file's path within the bag (relPath) is the key, and the BagItFile object is the value. The hash makes it easy to get files by relative path within the archive (e.g. data/photos/img.jpg).

Source:

manifestAlgorithmsFoundInBag :Array.<string>

This is a list of manifest algorithms found during an initial scan of the bag. While some BagItProfiles specify that a manifest with algorithm X must be present, the profile does preclude other manifests with different algorithms also being present. It's common among APTrust bags, for example, to find both md5 and sha256 manifests. The validator does an initial scan of the bag to find extra manifests so it knows which checksums to run on payload and tag files.

If an initial scan reveals manifest-md5.txt and manifest-sha256.txt, the manifestAlgorithmsFoundInBag will contain ["md5", "sha256"].

See also _scanBag.

Source:

pathToBag :string

pathToBag is the path to the directory or tar file that contains the bag you want to validate.

Source:

profile :BagItProfile

profile is the BagItProfile against which we will validate the bag.

Source:

tagManifestAlgorithmsFoundInBag :Array.<string>

This is a list of tag manifest algorithms found during an initial scan of the bag.

If an initial scan reveals tagmanifest-md5.txt and tagmanifest-sha256.txt, tagManifestAlgorithmsFoundInBag will contain ["md5", "sha256"].

See also _scanBag.

Source:

Methods

_addBagItFile(entry) → {BagItFile}

_addBagItFile adds a BagItFile to the Validator.files hash, based on the entry it receives from the reader. At this point, the newly created BagItFile will have its path and stats info, but no parsed data or checksums.

Parameters:
Name Type Description
entry object

An entry returned by a TarReader or FileSystemReader.

Properties
Name Type Description
relPath string

The relative path of the file inside the bag.

fileStat FileStat | fs.Stat

An object containing stats info about the file.

Source:

_cleanEntryRelPath(relPath) → {string}

_cleanEntryRelPath removes trailing slashes from relPath. When the validator is reading from a tar file, this also removes the leading bag name from the path. Since tarred bags must untar to a directory whose name matches the bag, relative paths within tar files will always be prefixed with the bag name. To get a true relative path, we have to change "bagname/data/file.txt" to "data/file.txt".

Parameters:
Name Type Description
relPath string

The relative path, as we got it from the TarReader or FileSystemReader.

Source:

_getCryptoHashes(bagItFile) → {Array.<crypto.Hash>}

_getCryptoHashes returns a list of prepared cryptographic hashes that are ready to have bits streamed through them. Each hash includes a pre-wired 'end' event that assigns the computed checksum to the BagItFile's checksums hash. For example, a sha256 hash, once the bits have been pushed through it, will set the following in it's event event:

Parameters:
Name Type Description
bagItFile BagItFile

A file inside the directory or tarball. This is the file whose checksums will be computed.

This method is private, and it internal operations are subject to change without notice.

Source:
Example
bagItFile.checksums['sha256'] = "[computed hex value]";

_readBag()

This method reads the contents of the bag. The actual work is done in the callbacks. When reading is complete, this calls _validateFormatAndContents()

Source:

_readEntry(entry)

_readEntry reads a single entry from a TarReader or FileSystemReader. An entry represents one file within the bag (any type of file: payload, manifest, tag manifest, or tag file). This method add's the file's metadata to the Validator.files hash, computes the file's checksums, and parses the file's contents if necessary.

Parameters:
Name Type Description
entry object

An entry returned by a TarReader or FileSystemReader.

Source:

_readFile(bagItFile, readStream)

_readFile pushes the file's bits through whatever checksum algorithms the BagItProfile says we're supposed to validate. It also parses the contents of the file, if it happens to be a manifest, tag manifest, or text-based tag file.

Note that the read() method of TarReader and FileSystemReader will not advance until we've read the entire stream (or until it closes or errors out).

Parameters:
Name Type Description
bagItFile BagItFile

A file inside the directory or tarball.

readStream ReadStream

A stream from which we can read the file's contents.

Source:

_scanBag()

This method does an initial scan of the bag to see what manifests are present. While some BagItProfiles specify that a manifest with algorithm X must be present, the profile does preclude other manifests with different algorithms also being present. It's common among APTrust bags, for example, to find both md5 and sha256 manifests. The validator does an initial scan of the bag to find extra manifests so it knows which checksums to run on payload and tag files.

If an initial scan reveals manifest-md5.txt and manifest-sha256.txt, the manifestAlgorithmsFoundInBag will contain ["md5", "sha256"].

The most common reason for finding multiple manifests in a bag comes from institutions that internally require one checksumming algorithm, and who have to produce bags whose spec requires a different algorithm.

The validator will later validate ALL checksums, even those found in manifests that are not part of the BagItProfile.

Source:

_validateAllowedManifests(manifestType)

_validateAllowedManifests checks to see if the bag contains manifests not listed in the manifestsAllowed or tagManifestsAllowed list of the BagItProfile. This records illegal manifests in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Parameters:
Name Type Description
manifestType string

The type of manifest to look for. This should be either Constants.PAYLOAD_MANIFEST or {Constants.TAG_MANIFEST}.

Source:

_validateAllowedTagFiles()

_validateAllowedTagFiles checks to see if the bag contains tag files not listed in the tagFilesAllowed list of the BagItProfile. This records illegal tag files in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Source:

_validateFormatAndContents()

_validateFormatAndContents is called internally by the public validate() method. While validate() reads the contents of the bag, parses manifests and tag files, this method compares the info in the bag to what the BagItProfile says is valid.

This method is private, and it internal operations are subject to change without notice.

Source:

_validateManifestEntries(manifestType)

_validateManifestEntries checks to see that the checksum entries in a payload manifest or tag manifest match the actual computed digests of the files in the bag. It records mismatches in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Parameters:
Name Type Description
manifestType string

The type of manifest to look for. This should be either Constants.PAYLOAD_MANIFEST or {Constants.TAG_MANIFEST}.

Source:

_validateNoExtraneousPayloadFiles()

_validateNoExtraneousPayloadFiles checks for files in the data directory that are not listed in the payload manifest(s). It records offending files in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Source:

_validatePayloadOxum()

_validatePayloadOxum

Validates the Payload-Oxum tag, if present, by comparing the number of files and bytes in the bag's payload directory matches what's in the tag.

This method is private, and it internal operations are subject to change without notice.

Source:

_validateProfile() → {boolean}

_validateProfile validates the BagItProfile that will be used to validate the bag. If the profile itself is not valid, we can't proceed.

Errors in the BagItProfile will be copied into the validator.errors list.

Source:

_validateRequiredManifests(manifestType)

_validateRequiredManifests checks to see if the manifests required by the BagItProfile are actually present in the bag. If they're not, it records the error in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Parameters:
Name Type Description
manifestType string

The type of manifest to look for. This should be either Constants.PAYLOAD_MANIFEST or {Constants.TAG_MANIFEST}.

Source:

_validateSerialization() → {boolean}

_validateSerialization checks to see whether or not the bag is in a format that adheres to the profile's serialization rules.

For example, if the profile's serialization attribute is "required" and acceptSerialization is "application/tar", then this bag MUST be a tar file.

You can disable this check by setting Validator.disableSerializationCheck to true. You would want to do that in cases where you've built a bag and want to validate it before you tar or zip it.

Source:

_validateSerializationFormat() → {boolean}

_validateSerializationFormat checks to see if the bag is in an allowed serialized format. This is called only if necessary.

Source:

_validateTags()

_validateTags ensures that all required tag files are present, that all required tags are present, and that all tags have valid values if valid values were defined in the BagItProfile. This method records all the problems it finds in the Validator.errors array.

This method is private, and it internal operations are subject to change without notice.

Source:

_validateUntarDirectory() → {boolean}

_validateUntarDirectory is for tarred bags only. It checks to see whether the tar file extracts to a directory whose name matches the bag name, minus the ".tar" extension. If it doesn't untar there, this method adds an error to the validation results.

E.g. "myBag.tar" should untar to a directory called "myBag"

This rule only apples for BagItProfiles where tarDirMustMatchName is true.

The official BagIt 1.0 spec at https://tools.ietf.org/html/draft-kunze-bagit-17#section-2 says:

The base directory can have any name.

This method is private, and it internal operations are subject to change without notice.

Source:

fileExtension() → {string}

This returns the file extension of the bag in this.pathToBag. If the bag is a directory, this returns an empty string, but you should still check on your own to see whether pathToBag points to a directory. In the special (and common) case of '.tar.gz' files, this returns '.tar.gz'.

Source:

getNewReader() → {Plugin}

Returns a reader plugin that is capable of reading the bag we want to validate. Note that this always returns a new reader, so if you call it 20 times, you're going to get 20 individual reader objects.

Source:

payloadFiles() → {Array.<BagItFile>}

Returns an array of BagItFile objects that represent payload files.

Source:

payloadManifests() → {Array.<BagItFile>}

Returns an array of BagItFile objects that represent payload manifests.

Source:

readingFromDir() → {boolean}

readingFromDir returns true if the bag being validated is unserialized. That is, it is a directory on a file system, and not a tar, zip, gzip, or other single-file format.

Source:

readingFromTar() → {boolean}

readingFromTar returns true if the bag being validated is in tar format.

Deprecated:
  • Will be removed soon.

Source:

tagFiles() → {Array.<BagItFile>}

Returns an array of BagItFile objects that represent tag files.

Source:

tagManifests() → {Array.<BagItFile>}

Returns an array of BagItFile objects that represent tag manifests.

Source:

validate()

validate runs all validation operations on the bag specified in the validator's pathToBag property. This includes:

  • making sure the untarred bag name matches the tarred bag name
  • validation checksums for all payload files, and for tag files that have checksums
  • ensuring there are no extra or missing payload files
  • ensuring that required tag files and manifests are present and valid
  • ensuring that required tags are present and, where applicable, have legal values

This method emits events "start", "task", "end", and "error".

Source: