Settings
Settings Files
For developer and test setups, we store settings in simple .env files that each worker reads at startup. You'll notice files like .env.test in Preserv's source repo. You can tell workers which .env file to read by specifying APT_ENV on the command line.
For example, APT_ENV=test apt_fixity
would run the apt_fixity worker using the configuration settings in .env.test. If you want to read settings from .env.local, run APT_ENV=local apt_fixity
.
Settings Injection Through AWS Parameter Store
For our staging, demo, and production environments, we ship an empty .env file inside the Docker container. Before Amazon's ECS service starts a new container instance, it runs a "sidecar" app that pulls variables from AWS parameter store into environment variables inside each container.
The code in config.go uses Viper's AutomaticEnv
method to pull in environment variables. Any variables not defined in the .env file will be fetched from the environment, if present.
This allows us to use simple .env config files on developer machines and in CI tests, without exposing sensitive information in our public Git repos. For staging, demo, and production systems, all config settings are centralized in Parameter Store.
Note that variable names in Parameter Store follow the pattern ENV/PRESERVE/VAR_NAME
, where ENV is the name of the environment (staging, demo, production) and VAR_NAME is the setting name. For example, the name of the variable containing the demo NSQ url is /DEMO/PRESERV/NSQ_URL
.
Definitions
The definitions below pertain to all Preserv workers. In addition to these, each worker has specific settings describing number of workers and buffer size. Those are described on worker-specific pages.
In general, settings ending in WORKERS
describe how many go routines (concurrent processes) a worker should run. Settings ending in BUFFER_SIZE
describe the desired size of internal queue buffers. On a macro level, the buffer size settings tell the work how many items to accept from NSQ in each batch.
If a worker has two WORKERS
and a BUFFER_SIZE
of 20, each go routine will accept up to 20 items at a time from NSQ. The Docker container then can be working on up to 40 NSQ items at a time.
Variable Name | Definition |
---|---|
BUCKET_GLACIER_DEEP_OH | The name of the Glacier Deep preservation bucket in Ohio. This is for storage option Glacier-Deep-OH. |
BUCKET_GLACIER_DEEP_OR | The name of the Glacier Deep preservation bucket in Oregon. This is for storage option Glacier-Deep-OR. |
BUCKET_GLACIER_DEEP_VA | The name of the Glacier Deep preservation bucket in Virginia. This is for storage option Glacier-Deep-VA. |
BUCKET_GLACIER_OH | The name of the Glacier preservation bucket in Ohio. This is for storage option Glacier-OH. |
BUCKET_GLACIER_OR | The name of the Glacier preservation bucket in Oregon. This is for storage option Glacier-OR and Standard. |
BUCKET_GLACIER_VA | The name of the Glacier preservation bucket in Virginia. This is for storage option Glacier-VA. |
BUCKET_STANDARD_VA | The name of the S3 bucket for standard storage in Virginia. This is for storage option Standard. |
BUCKET_WASABI_OR | The name of the Wasabi preservation bucket in Oregon. |
BUCKET_WASABI_VA | The name of the Wasabi preservation bucket in Virginia. |
MAX_DAYS_SINCE_LAST_FIXITY | The interval at which the fixity checker should check files. In production and demo, this is set to 90, so that we run fixity checks every 90 days. In staging, you can set this down to one day if you want to force fixity checks to run. Typical staging setting is 14. |
MAX_FIXITY_ITEMS_PER_RUN | The maximum number of files that the queue_fixity worker should queue on each run. The default value is 2500. |
NSQ_LOOKUP | The hostname and port of the NSQ lookup daemon. This usually has a format like hostname:port or ip_addr:port. The lookup daemon tells NSQ clients (our workers) how to connect to any and all available NSQ instances. |
NSQ_URL | The hostname and port of our NSQ service. |
PRESERV_REGISTRY_API_KEY | The API key workers use to access the Registry. |
PRESERV_REGISTRY_API_USER | The API user email address that our workers use to access the Registry. This account must have the APTrust admin role, as it accesses the admin API. |
PRESERV_REGISTRY_URL | The URL of the Registry. Workers read and write WorkItems and other data in this Registry. |
QUEUE_FIXITY_INTERVAL | This describes how often, in minutes, the queue fixity worker should check for files requiring fixity checks. We usually set this to 30. |
REDIS_URL | The Redis (or Elasitiche) URL. Ingest workers connect to this service to store, retrieve and update interim processing data. On dev and CI machines, we use Redis. In AWS environments, we use Elasticache. |
S3_AWS_HOST | The generic hostname for AWS S3: s3.amazonsws.com. |
S3_AWS_KEY | The Access Key ID used to access items in S3 buckets and in Glacier. This account should have full privileges in S3 and Glacier. |
S3_AWS_SECRET | The AWS Secret Access Key used to interact with S3 and Glacier. |
S3_WASABI_HOST_OR | The hostname of Wasabi's Oregon S3 service. s3.us-west-1.wasabisys.com |
S3_WASABI_HOST_VA | The hostname of Wasabi's Virginia S3 service. s3.us-east-1.wasabisys.com |
S3_WASABI_KEY | The Access Key ID used to access Wasabi buckets. |
S3_WASABI_SECRET | The Secret Access Key used to access Wasabi buckets. |
STAGING_BUCKET | The name of the AWS S3 bucket into which the staging uploader copies the files it unpacks from tarred bags in the receiving buckets. The format identifier and other workers will access files in this bucket during later stages of ingest. |