BlackPearl “Chunks” and “Blobs”

BlackPearl uses the Spectra S3 protocol to manage files. We receive many questions on how files interact with BlackPearl using this protocol, and particularly how files are broken up for moving them to and from BlackPearl. Hopefully this post will help clarify the topic.

Jobs, Chunks and Blobs

A job is a management container for file input/output operations with BlackPearl. A job must target a single bucket. So, for example, a client might start a job to send 20 files to a bucket in BlackPearl.

A chunk is a unit into which jobs are broken by BlackPearl. A chunk consists of one or more blobs (see below). The default chunk size is dynamic based on the storage target.

  • If the storage target is tape, then the chunk size will be 2% of the uncompressed tape capacity. So for example, an LTO-7 uncompressed tape’s capacity is 6 trillion bytes, so the chunk size for LTO-7 would be 2% of this or approximately 112GB.
  • If the storage target is disk (e.g. ArcticBlue or Online Disk), then the preferred blob size (64GB, see below) is used as the chunk size.

If the total job size is less than or equal to the chunk size, there will only be one chunk for the job. Note that the BlackPearl cache is managed by allocating space for each chunk on a PUT or GET request.

A blob is a file or file part sent to or received from BlackPearl in one PUT or GET operation. A blob will never consists of more than one file. One or more blobs will make up a chunk. The preferred blob size is 64GB, but this size can vary to match the file size or fit in a chunk, and the size can also be changed by the client.

Example -- Let’s assume you have a 500GB job to upload to BlackPearl with a number of files (1 250GB file, 1 125GB file, 5 25GB files). The storage target is a tape library using LTO-7 tapes. Let’s also assume the chunk size is 112GB and the blob size is 64GB. The job will be broken up into five chunks that look like this:

  • Chunk 1 -- Size: 112GB
    • Blob 1 -- Part 1 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 2 of 250GB file -- Size: 48GB
  • Chunk 2 -- Size: 112GB
    • Blob 1 -- Part 3 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 4 of 250GB file -- Size: 48GB
  • Chunk 3 -- Size: 112GB
    • Blob 1 -- Part 5 of 250GB file -- Size: 26GB
    • Blob 2 -- Part 1 of 125 GB file -- Size: 86GB
  • Chunk 4 -- Size: 112GB
    • Blob 1 -- Part 2 of 125GB file -- Size: 39GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB
    • Blob 4 -- Part 1 of 25GB file -- Size: 23GB
  • Chunk 5 -- Size: 52GB
    • Blob 1 -- Part 2 of 25GB file -- Size: 2GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB

How the SDK Helper Functions Work with Chunks and Blobs

The Java and .NET/C# SDKs include “Helper” functions that make it easier to move files to and from BlackPearl. These Helper function provide a layer of abstraction over the concept of BlackPearl chunks and blobs. The client code does not need to know how to manage chunks and blobs, and therefore greatly reduces the effort to integrate a client with BlackPearl. We therefore recommend that client developers use the Helper functions whenever possible.

These Helper functions can transfer blobs in parallel threads concurrently. The number of threads can be set by the client. For example, the Java CLI, which uses the Java SDK and its Helper functions, is set to transfer 10 parallel threads. Note that there are some limitations on the ability to transfer blobs in parallel that are made up of the same file. An individual mounted file on a file system path can have multiple blobs sent to BlackPearl in parallel. But an individual file received via a file stream/channel cannot have multiple blobs sent to BlackPearl in parallel.

Blobbing and Checksums

BlackPearl performs a checksum on all files sent to it. If a file must be broken up into multiple blobs, it will perform a checksum on each individual blob. It will not perform a checksum across the entire file if the file is broken into multiple blobs. If a client application is tracking checksums for files and comparing them to BlackPearl’s recorded value for verification, it must be aware that it should compare the checksum of the blobs rather than the checksum of the entire file. A client could store the checksum of the entire file in the meta data of one of the blobs that it uploads to BlackPearl. However, BlackPearl would not use this checksum value itself. Read More About BlackPearl and Checksums

Leave a Reply

You must be logged in to post a comment.