The Preservation Verifier
The preservation verifier checks that all of the files copied by the preservation uploader are actually present in the preservation storage buckets. It issues a HEAD request for each file and ensures that the size matches. When possible, it also ensures that the etag matches the file's md5 checksum. (This is only possible for smaller files, not for large, mutli-part uploads.)
If any check fails, the verifier marks the ingest as failed, sets a note about the missing/incorrect file in the note field of the WorkItem, and sets the WorkItem's NeedsAdminReview
flag to true.
In case you're wondering why this component exists, see Why Does the Preservation Verifier Exist?
Resources
Though this worker may issue a number of S3 requests, it does not use much network bandwidth because HEAD requests tend to return about 1 kb of data. The worker uses little CPU and memory, and tends to finish quickly.
External Services
Service | Function |
---|---|
Preservation Buckets | Worker verifies that files were successfully copied to perservation storage, as described in the storage records retrieved from Redis. |
Redis | Worker updates file records to indicate that files have been verified in preservation storage. |
Registry | Source of WorkItem record describing work to be done. |
NSQ | Distributes WorkItem IDs to workers and tracks their status. |
Source Files
Worker | Service | Files | Definition |
---|---|---|---|
Preservation Verifier | Ingest | Task Worker App |
Verifies that files copied to preservation storage are actually there. |