BlackPearl and Checksums

Spectra Logic’s highest priority is protecting our customers’ data. BlackPearl and the tape libraries that sit behind it have a number of features to ensure that your data stays protected. We start data integrity as soon as data is ingested by BlackPearl. Client applications sending data to BlackPearl can pass in a checksum to ensure that the data arrives safely. A checksum acts as a fingerprint for the file and can be used to make sure that the file received by BlackPearl is the same file that the client thought it sent. BlackPearl accepts MD-5, CRC-32, CRC-32C, SHA-256, or SHA-512 checksums. The client provides the checksum type and value in the header of the API PUT operation:

Content-MD5: 37r//gvw/aB3GmilbcUJpg==

A checksum can also be performed using our C, C#/.NET, and Java Software Development Kits.

BlackPearl uses the checksum provided for each file to ensure that the file it received is the same as the file the client sent. If the checksum does not match correctly with the file that was received, BlackPearl will return an error (400 – bad digest) to the client. If the client does not pass in a checksum with the file, BlackPearl will automatically create a checksum for the file. By default this checksum type will be MD-5, though the checksum type can be changed on the data policy settings on the bucket (a bucket is a top-level container for objects/files in BlackPearl). BlackPearl then stores the checksum, whether generated by the client or BlackPearl itself, both in its object database and with the file on tape.

Once stored by BlackPearl, the checksum can be used internally to make sure the file is still valid when it is recalled back from tape. BlackPearl will automatically do a checksum if the file is requested by a client (GET) and the file must come from tape. The checksum is done both as the file is coming from tape and after it has landed on the BlackPearl cache. BlackPearl will also provide the checksum value to the client so that the client can verify that it successfully received the file as well.

In some cases, when using the Bulk PUT operations, the client may be required to break the object into multiple “blobs” to upload it to BlackPearl. In this case, a checksum will be used for each blob uploaded to BlackPearl. The same is true when using Multi-Part Upload – each part of the file will have its own checksum. BlackPearl also supports Partial-File Restore, which is the ability to restore (GET) part of a file. With Partial-File Restore, a client can specify, for example, that it wants to retrieve the first 2GB of a 10GB file. In order to perform a checksum in this case, BlackPearl must first retrieve the entire file (or chunk) from tape. Once BlackPearl has completed the checksum on the file or chunk, it can send the partial file to the client.

Because BlackPearl may not always calculate the checksum for an entire file (because it may be broken up into multiple pieces), developers may want to have their clients calculate the entire file checksum itself. This value could then be stored in the file’s metadata when uploaded to BlackPearl. When the file is later retrieved, the client could calculate the checksum again and compare the values.