Skip to content

Format Writers

Format writers write file formats like tar, zip, OCFL, etc. Format writers provide methods for writing contents into a directory or serialized file. The initial release of DART 2.0 includes two format writers: a FileSystemWriter and a TarWriter. Developers can use these two examples as references for how to write a format writer.

Format writers may be synchronous or asynchronous under the hood. Certain types of writers, such as the built-in TarWriter MUST be synchronous internally because the tar format requires files to be written one at a time, in order. To assist with this, the BaseWriter implements a queue that executes requests sequentially, in the order they were received, with each write request beginning only after the last write has completed.

API and Events

Format writers should extend the DART BaseWriter and provide a description() method that returns meaningful PluginDescription information.

Format writers must implement the following methods:

  • A constructor that takes a single parameter, which is the path to the file or directory that the reader will write.

  • A static definition method that takes no parameters and returns a description of the plugin (as described in The Base Plugin).

  • An add method that takes two parameters: a BagItFile and an options list of cryptographic hash algorithm names (such as 'md5', 'sha256', etc.). This method must emit a fileAdded event each time it writes a file into the target directory (or tar archive, zip archive, etc). Note that a BagItFile is a simple object with properties to denote the source path from which the file is copied, the destination path to which it should be copied, and some basic stats information that includes the file size. In the process of copying the file, the Format Writer should calculate the requested checksums and store them in the checksums hash of the BagItFile.

Format writers should fire the fileAdded event each time a file has been written. See the add method of FileSystemWriter or TarWriter to see when and how this occurs. The data passed by the fileAdded event includes the BagItFile that was just added and the percent complete of the overall write operation (i.e. number of bytes written divided by number of total bytes to be written).

Format writers won't need to emit the error event on their own, unless some special circumstance warrents it. The BaseWriter emits this event, passing back a string showing the name of the file being written and the error message.

Note

The BaseWriter always emits the finish event immediately after emitting the error event. This is done on the assumption that there is no sense in continuing to write a failed, incomplete, or corrupt output file.

Format writers don't need to emit the finish event. The BaseWriter takes care of that internally. This event does not pass any parameters.

Note

After emission of the finish event, you should be able to check the filesWritten property to get the number of files written. That number is updated by the BaseWriter's onFileWritten method. If you override that method, be sure your override calls super().

Format Writer Queue Functions

DART's BaseWriter uses a one-at-a-time queue to force sequential synchronous writes. The queue is an instance of async.queue, and it requires a function to be run on each item. The writeIntoArchive() functions in FileSystemWriter source and TarWriter source provide examples on how to write a function for the queue.

Notice that in both format writers, the add() method sets up a hash containing data and functions, then passes that hash into the queue by calling this._queue.push(data). The queue eventually hands that data structure off to writeIntoArchive(), which does some piping internally to calculate multiple checksums in a single write.