Java CLI Updates – Symbolic Link Support, Performance Test

We have released an update to the Java CLI BlackPearl client. You can download this latest release (1.2.4) from the Java CLI Releases page on GitHub. There are several updates in the new release, including two major new features detailed below.

Symbolic Link Support

Support was added for Symbolic Links in Unix/Linux. This is relevant in the case where the put_bulk command is used to move a directory of files to BlackPearl. If the Java CLI sees that the directory contains a symbolic link, it will attempt to also include the file to which the symbolic link points (even if it is not in that directory). Here are a few important notes regarding symbolic link support:

  • The user account under which the Java CLI is being run must have access to the file to which the symbolic link is pointing.
  • If there are several symbolic links pointing to the same file, but from different folder paths, BlackPearl will include a copy of the file for each symbolic link because each “file” is in a different folder path.
  • When the CLI restores the file from BlackPearl, it will restore the actual files, not the symbolic links. This is because BlackPearl is not aware that it is a symbolic link.

Performance Test

We added a new performance command to the CLI. We hope this new performance test will make it easier to troubleshoot BlackPearl performance.  This command tests network performance as well as the performance to the BlackPearl cache. This does not test the back-end performance of the BlackPearl (speed between BlackPearl and tape library or permanent disk target).

Users can specify the number and size of files in the test. The files are automatically generated on the fly by the CLI and not written to disk (this eliminates the disk read/write speed from the test). When the command is executed, the CLI will send these files to BlackPearl in a PUT operation. Once the PUT operation is complete, the CLI will then issue a GET operation to retrieve those same files from BlackPearl.

Here is an example of the performance command being run. In this case there were 50 files, each 1000MB, using a bucket called “performance_testing”:

ds3_java_cli -e 10.85.41.86 -a Z3JlZ2o= -k 4QJFANLi -c performance -b performance_testing -n 50 -s 1000 --http

Here is the output from the CLI (results highlighted in yellow):

cliPerfTest

 

Here is the corresponding performance graph from the BlackPearl web management interface.

cliPerResultBP

 

We hope these new features will make it easier to use BlackPearl. Please give us your feedback or ask questions on our Forums.

 


BlackPearl “Chunks” and “Blobs”

BlackPearl uses the Spectra S3 protocol to manage files. We receive many questions on how files interact with BlackPearl using this protocol, and particularly how files are broken up for moving them to and from BlackPearl. Hopefully this post will help clarify the topic.

Jobs, Chunks and Blobs

A job is a management container for file input/output operations with BlackPearl. A job must target a single bucket. So, for example, a client might start a job to send 20 files to a bucket in BlackPearl.

A chunk is a unit into which jobs are broken by BlackPearl. A chunk consists of one or more blobs (see below). The default chunk size is dynamic based on the storage target.

  • If the storage target is tape, then the chunk size will be 2% of the uncompressed tape capacity. So for example, an LTO-7 uncompressed tape’s capacity is 6 trillion bytes, so the chunk size for LTO-7 would be 2% of this or approximately 112GB.
  • If the storage target is disk (e.g. ArcticBlue or Online Disk), then the preferred blob size (64GB, see below) is used as the chunk size.

If the total job size is less than or equal to the chunk size, there will only be one chunk for the job. Note that the BlackPearl cache is managed by allocating space for each chunk on a PUT or GET request.

A blob is a file or file part sent to or received from BlackPearl in one PUT or GET operation. A blob will never consists of more than one file. One or more blobs will make up a chunk. The preferred blob size is 64GB, but this size can vary to match the file size or fit in a chunk, and the size can also be changed by the client.

Example -- Let’s assume you have a 500GB job to upload to BlackPearl with a number of files (1 250GB file, 1 125GB file, 5 25GB files). The storage target is a tape library using LTO-7 tapes. Let’s also assume the chunk size is 112GB and the blob size is 64GB. The job will be broken up into five chunks that look like this:

  • Chunk 1 -- Size: 112GB
    • Blob 1 -- Part 1 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 2 of 250GB file -- Size: 48GB
  • Chunk 2 -- Size: 112GB
    • Blob 1 -- Part 3 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 4 of 250GB file -- Size: 48GB
  • Chunk 3 -- Size: 112GB
    • Blob 1 -- Part 5 of 250GB file -- Size: 26GB
    • Blob 2 -- Part 1 of 125 GB file -- Size: 86GB
  • Chunk 4 -- Size: 112GB
    • Blob 1 -- Part 2 of 125GB file -- Size: 39GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB
    • Blob 4 -- Part 1 of 25GB file -- Size: 23GB
  • Chunk 5 -- Size: 52GB
    • Blob 1 -- Part 2 of 25GB file -- Size: 2GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB

How the SDK Helper Functions Work with Chunks and Blobs

The Java and .NET/C# SDKs include “Helper” functions that make it easier to move files to and from BlackPearl. These Helper function provide a layer of abstraction over the concept of BlackPearl chunks and blobs. The client code does not need to know how to manage chunks and blobs, and therefore greatly reduces the effort to integrate a client with BlackPearl. We therefore recommend that client developers use the Helper functions whenever possible.

These Helper functions can transfer blobs in parallel threads concurrently. The number of threads can be set by the client. For example, the Java CLI, which uses the Java SDK and its Helper functions, is set to transfer 10 parallel threads. Note that there are some limitations on the ability to transfer blobs in parallel that are made up of the same file. An individual mounted file on a file system path can have multiple blobs sent to BlackPearl in parallel. But an individual file received via a file stream/channel cannot have multiple blobs sent to BlackPearl in parallel.

Blobbing and Checksums

BlackPearl performs a checksum on all files sent to it. If a file must be broken up into multiple blobs, it will perform a checksum on each individual blob. It will not perform a checksum across the entire file if the file is broken into multiple blobs. If a client application is tracking checksums for files and comparing them to BlackPearl’s recorded value for verification, it must be aware that it should compare the checksum of the blobs rather than the checksum of the entire file. A client could store the checksum of the entire file in the meta data of one of the blobs that it uploads to BlackPearl. However, BlackPearl would not use this checksum value itself. Read More About BlackPearl and Checksums