BlackPearl Object Naming

As with most cloud storage systems, when a file is uploaded to BlackPearl we call it an “object”. There are some constraints around how objects are named in BlackPearl due to it often being connected to a tape system that stores data using the open LTFS format.

When you configure BlackPearl, you set up Storage Domain(s), which are collections of tape and/or disk partitions. If the Storage Domain includes tape partition(s), you must specify the “LTFS File Name” option for the Storage Domain. This option specifies how the file is named when it is placed on tape. There are two options for the LTFS File Name:

  • Object Name — LTFS file names use the format {bucket name}/{object name}, for example bucket1/video1.mov. Object names must comply with LTFS 2.4 specification file naming rules. The file name must be 255 characters or less. If the tapes are ejected from the BlackPearl gateway and loaded into a non‐BlackPearl tape partition, the file names match the object names. The colon character (:) is not allowed in LTFS file names and therefore not allowed in BlackPearl object names. The slash character (/) is also technically not allowed in LTFS files names; however, in BlackPearl a slash is allowed in the object name and will get translated as a directory in the LTFS file system (e.g. directory1/directory2/video1.mov). The following characters are not recommended in LTFS file names or BlackPearl object names for reasons of cross-platform compatibility: control characters such as carriage return (CR) and line feed (LF), double quotation mark (“), asterisk (*), question mark (?), less than sign (<), greater than sign (>),  backslash (\), vertical line (|). Note that per the LTFS specification, “Implementations which claim compliance with version 2.4.0 or later of this specification shall support the percent-encoding of names … in order to avoid issues with the characters listed [above].”
  • Object ID — LTFS file names use the format {bucket name}/{object id}, for example bucket1/1fc6f09c‐dd72‐41ea‐8043‐0491ab8a6d82. Object names do not need to comply with LTFS file naming rules. The object names are saved as LTFS extended attributes allowing any third party application to reconstruct all the data including the object names.

When building a BlackPearl client, developers should be prepared for customers to choose either LTFS File Name option, and thus should ensure that object names are LTFS compliant as described in the first option above.


Java CLI Updates – Symbolic Link Support, Performance Test

We have released an update to the Java CLI BlackPearl client. You can download this latest release (1.2.4) from the Java CLI Releases page on GitHub. There are several updates in the new release, including two major new features detailed below.

Symbolic Link Support

Support was added for Symbolic Links in Unix/Linux. This is relevant in the case where the put_bulk command is used to move a directory of files to BlackPearl. If the Java CLI sees that the directory contains a symbolic link, it will attempt to also include the file to which the symbolic link points (even if it is not in that directory). Here are a few important notes regarding symbolic link support:

  • The user account under which the Java CLI is being run must have access to the file to which the symbolic link is pointing.
  • If there are several symbolic links pointing to the same file, but from different folder paths, BlackPearl will include a copy of the file for each symbolic link because each “file” is in a different folder path.
  • When the CLI restores the file from BlackPearl, it will restore the actual files, not the symbolic links. This is because BlackPearl is not aware that it is a symbolic link.

Performance Test

We added a new performance command to the CLI. We hope this new performance test will make it easier to troubleshoot BlackPearl performance.  This command tests network performance as well as the performance to the BlackPearl cache. This does not test the back-end performance of the BlackPearl (speed between BlackPearl and tape library or permanent disk target).

Users can specify the number and size of files in the test. The files are automatically generated on the fly by the CLI and not written to disk (this eliminates the disk read/write speed from the test). When the command is executed, the CLI will send these files to BlackPearl in a PUT operation. Once the PUT operation is complete, the CLI will then issue a GET operation to retrieve those same files from BlackPearl.

Here is an example of the performance command being run. In this case there were 50 files, each 1000MB, using a bucket called “performance_testing”:

ds3_java_cli -e 10.85.41.86 -a Z3JlZ2o= -k 4QJFANLi -c performance -b performance_testing -n 50 -s 1000 --http

Here is the output from the CLI (results highlighted in yellow):

cliPerfTest

 

Here is the corresponding performance graph from the BlackPearl web management interface.

cliPerResultBP

 

We hope these new features will make it easier to use BlackPearl. Please give us your feedback or ask questions on our Forums.

 


BlackPearl “Chunks” and “Blobs”

BlackPearl uses the Spectra S3 protocol to manage files. We receive many questions on how files interact with BlackPearl using this protocol, and particularly how files are broken up for moving them to and from BlackPearl. Hopefully this post will help clarify the topic.

Jobs, Chunks and Blobs

A job is a management container for file input/output operations with BlackPearl. A job must target a single bucket. So, for example, a client might start a job to send 20 files to a bucket in BlackPearl.

A chunk is a unit into which jobs are broken by BlackPearl. A chunk consists of one or more blobs (see below). The default chunk size is dynamic based on the storage target.

  • If the storage target is tape, then the chunk size will be 2% of the uncompressed tape capacity. So for example, an LTO-7 uncompressed tape’s capacity is 6 trillion bytes, so the chunk size for LTO-7 would be 2% of this or approximately 112GB.
  • If the storage target is disk (e.g. ArcticBlue or Online Disk), then the preferred blob size (64GB, see below) is used as the chunk size.

If the total job size is less than or equal to the chunk size, there will only be one chunk for the job. Note that the BlackPearl cache is managed by allocating space for each chunk on a PUT or GET request.

A blob is a file or file part sent to or received from BlackPearl in one PUT or GET operation. A blob will never consists of more than one file. One or more blobs will make up a chunk. The preferred blob size is 64GB, but this size can vary to match the file size or fit in a chunk, and the size can also be changed by the client.

Example -- Let’s assume you have a 500GB job to upload to BlackPearl with a number of files (1 250GB file, 1 125GB file, 5 25GB files). The storage target is a tape library using LTO-7 tapes. Let’s also assume the chunk size is 112GB and the blob size is 64GB. The job will be broken up into five chunks that look like this:

  • Chunk 1 -- Size: 112GB
    • Blob 1 -- Part 1 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 2 of 250GB file -- Size: 48GB
  • Chunk 2 -- Size: 112GB
    • Blob 1 -- Part 3 of 250GB file -- Size: 64GB
    • Blob 2 -- Part 4 of 250GB file -- Size: 48GB
  • Chunk 3 -- Size: 112GB
    • Blob 1 -- Part 5 of 250GB file -- Size: 26GB
    • Blob 2 -- Part 1 of 125 GB file -- Size: 86GB
  • Chunk 4 -- Size: 112GB
    • Blob 1 -- Part 2 of 125GB file -- Size: 39GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB
    • Blob 4 -- Part 1 of 25GB file -- Size: 23GB
  • Chunk 5 -- Size: 52GB
    • Blob 1 -- Part 2 of 25GB file -- Size: 2GB
    • Blob 2 -- 25GB file -- Size: 25GB
    • Blob 3 -- 25GB file -- Size: 25GB

How the SDK Helper Functions Work with Chunks and Blobs

The Java and .NET/C# SDKs include “Helper” functions that make it easier to move files to and from BlackPearl. These Helper function provide a layer of abstraction over the concept of BlackPearl chunks and blobs. The client code does not need to know how to manage chunks and blobs, and therefore greatly reduces the effort to integrate a client with BlackPearl. We therefore recommend that client developers use the Helper functions whenever possible.

These Helper functions can transfer blobs in parallel threads concurrently. The number of threads can be set by the client. For example, the Java CLI, which uses the Java SDK and its Helper functions, is set to transfer 10 parallel threads. Note that there are some limitations on the ability to transfer blobs in parallel that are made up of the same file. An individual mounted file on a file system path can have multiple blobs sent to BlackPearl in parallel. But an individual file received via a file stream/channel cannot have multiple blobs sent to BlackPearl in parallel.

Blobbing and Checksums

BlackPearl performs a checksum on all files sent to it. If a file must be broken up into multiple blobs, it will perform a checksum on each individual blob. It will not perform a checksum across the entire file if the file is broken into multiple blobs. If a client application is tracking checksums for files and comparing them to BlackPearl’s recorded value for verification, it must be aware that it should compare the checksum of the blobs rather than the checksum of the entire file. A client could store the checksum of the entire file in the meta data of one of the blobs that it uploads to BlackPearl. However, BlackPearl would not use this checksum value itself. Read More About BlackPearl and Checksums


BlackPearl Performance, File Size, and Job Size

BlackPearl uses Bulk PUT and GET commands to move files to and from its storage targets -- tape and disk. We call them “Bulk” commands because they transfer multiple files in one “job”. The bulk commands provide several advantages over traditional, single-file S3 PUT and GET operations, including: 1) ensuring that the BlackPearl cache is ready to receive the files; and 2) providing adequate data to continuously write or read data on to the storage target at a high level of performance.

Many factors affect the file transfer performance to and from BlackPearl, including network configuration and equipment specifications, transfer rate capabilities from the primary storage medium, file transfer software architecture and file sizes/job size. To maximize file transfer performance (typically measured in megabytes per second) on a bulk PUT or GET job, the size of both the individual files in the job and the total job must be considered. Spectra Logic has performed considerable testing and analysis of various files sizes and has determined ideal file sizes for maximizing performance in bulk PUT and GET jobs. The results of this testing are below.

We recommend developers build BlackPearl client that target at least the “Good” numbers below. Based on this recommendation, developers will want individual files sizes in the tens of megabytes and total job sizes in the hundreds of gigabytes. Sizes smaller than these can result in significant performance degradation.

For more tips on building a BlackPearl client, see our Guidance and Tips page.

Individual Objects/Files in a Bulk PUT or GET Job

Performance File Size
Poor >= 2 MB
Fair >= 5 MB
Good > = 50 MB
Great >= 1 GB

Total Bulk Job Size

Performance LTO-6 Tape Drives LTO-7 or TS1150 Tape Drives
Poor >= 10 GB >= 20 GB
Fair >= 50 GB >= 100 GB
Good >= 200 GB >= 400 GB
Great >= 1 TB >= 2 TB

Performance descriptions:  
Poor — Performance will have degraded by no more than one order of magnitude
Fair — Performance will have degraded by around half
Good — Performance will have degraded by around 10%
Great — Maximum/optimal performance


Preparing for the BlackPearl 3.0 Release

As mentioned in our last two blog posts (see Part 1 and Part 2), BlackPearl 3.0 will be released soon and will include some exciting new features. Shortly after BlackPearl 3.0 is released, we will be also be releasing associated updates to our Software Development Kits (SDKs), which are available in Java, C#/.NET, C and Python. BlackPearl client developers should understand how to prepare for this new release.

For the most part, BlackPearl clients using the existing 1.x SDKs should function normally with BlackPearl 3.0. The one exception to this is the 1.x Java SDK, which will require a small patch in order to work with BlackPearl 3.0. We will be releasing this patch, which will be Java SDK Release 1.2.1, soon and will announce it on this Blog once it becomes available. Also, if your client programmatically creates buckets, you should ensure that the user account the client uses to create the bucket has a default data policy (see screen image below).

defaultDataPolicyBlogDec2015

 

When BlackPearl 3.0 is released, we will also be releasing a new version of the BlackPearl Simulator so that developers can test their client code against the new release before upgrading any actual BlackPearl systems. We are recommending this because there have been significant changes and additions to the 3.0 Application Program Interface (API) and, while it is our goal to be fully backward compatible with 1.x clients, client integration testing to ensure your client’s full compatibility with BlackPearl 3.0 would be wise.

The SDKs provide a layer of abstraction over the HTTP-based, RESTful API commands of the BlackPearl. The current 1.x SDKs provide access to only a subset of the most popular BlackPearl API commands. This is because each API command had to be manually programmed in the SDK by our Engineering Team, and it was too time intensive to write SDK commands for all API commands. So while the 1.x SDKs have commands for common actions such as moving files to and from BlackPearl and creating buckets, they did not have commands for less common actions such as inspecting and ejecting tapes.

With our 3.0 SDKs, we developed a technique to automatically generate most of the SDK code needed for each associated API command. This means that almost every BlackPearl API command will have an associated command in each of the SDKs. So developers using the SDKs will now be able to access nearly all API commands via the SDKs, including new features such as Advanced Bucket Management and Access Control Lists. We will also continue to make the SDKs easier to use as we have already done, such as adding the “helper” functions (currently available in the Java and .NET SDKs) that further simplify the archive and restore process.

There are three areas of change that developers should be aware of when upgrading their client from 1.x to 3.0:

  1. Due to the nature of the 3.0 SDKs, and the fact that they support nearly all BlackPearl API commands, we had to restructure the SDK code base compared to the 1.x SDKs. The BlackPearl API supports Standard S3 as well as Spectra S3 commands, and in some cases there are separate commands for each protocol (Standard S3 versus Spectra S3) that do essentially the same thing. So for example, there is a Standard S3 command to create a bucket and a separate Spectra S3 command to create a bucket, each with different input parameters. In order to make both of these commands available in the SDKs, we had to create separate namespaces for each protocol. Therefore, if you are updating your code from 1.x to 3.0, you will likely have to append the appropriate namespace in your code to each command. We will provide specific examples once the 3.0 SDKs are released.
  2. Some method names will have to be changed between 1.x and 3.0. We will provide a full list of those name changes.
  3. There will be a few methods where the ordering of arguments will have to be changed. We will provide a list of of those methods.

The new 3.0 SDKs will also include a new set of documentation and code examples.

Developers should make sure that they are planning for the BlackPearl 3.0 release. Those upgrading from 1.x to 3.0 and who are not changing client functionality should expect it to be straightforward. For those wanting to enhance their clients to take advantage of new 3.0 features, our Developer website will be available for help. Look for more information on this Blog once BlackPearl 3.0 is released.


BlackPearl 3.0 New Features Part 2: Access Control Lists

In Part 1 of my BlackPearl 3.0 blog post, I discussed the new storage mediums and data policies that can be used with BlackPearl. Today I focus on another new feature of BlackPearl 3.0, Access Control Lists (ACLs). As the name implies, with ACLs you will be able to control what type of access users and applications have to the data in BlackPearl.

The concept of ACLs is not new. Amazon has been using ACLs with its public cloud storage for some time. BlackPearl’s ACL features are very similar, but not identical, to Amazon’s ACLs. These differences are due to the nature of a private cloud (BlackPearl) versus a public cloud system.

ACLs and Buckets

In BlackPearl, ACLs are primarily used to control permissions on objects in buckets, which is a top-level container in BlackPearl. Bucket permissions can be granted to a user or a group.

Groups are a new feature in BlackPearl 3.0, and can consist of users and other groups. Groups can be managed on the Users page in the web management interface. BlackPearl 3.0 ships with two default groups, “Everyone” and “Administrators”, and more groups can be created.

The following permission(s) can be granted to a user or group on a bucket:

  • List – List all objects in a bucket
  • Read – Download (GET) objects from a bucket
  • Write – Upload (PUT) objects to a bucket
  • Delete – Delete objects from a bucket
  • Job – Modify or cancel jobs associated with the bucket, even if the job wasn’t created by that user or group
  • Owner – Full control of bucket. Includes all permissions above. By default the user that creates the bucket is given Owner permissions on that bucket

When a bucket is created or edited in the web management interface, ACLs can be set on the bucket as shown in the first image below. A typical use case might be that you want to give full access to a bucket to certain users and only read access to other users. In this case you could create a “Full Access” group and set an ACL giving that group “Owner” permissions on the bucket. You could then create another group called “Read Only Access” and set an ACL giving that group “List” and “Read” permissions.

editBucketBlogPostNov2015

Global Permissions

Users and groups can be granted “Global” ACLs on all buckets. So for example, a user or group could be given “Read” permission on all objects in all buckets in BlackPearl. These settings are controlled in the “Global Bucket Access Control List” in the User and Group settings as shown in the two images below.

editUserBlogPostNov2015

editGroupBlogPostNov2015

ACLs and Data Policies

As mentioned Part 1 of this blog post, data policies can be set on buckets that control how many copies of the objects are kept on each storage medium and for how long. In some cases, you may only want certain users to have access to use certain data policies when creating a bucket. BlackPearl allows you to set ACLs on data policies that allow you to control this access. BlackPearl 3.0 includes a built-in “Everyone” group, and members of this group by default have global permissions to use all data policies as shown by the checked box in the image above. This means that by default all users can access all data policies. However, administrators can remove this setting and only let certain users access certain data policies, as shown below.

editPolicyBlogPostNov2015

ACLs in the API and SDKs

All the ACL settings that are available in the web management interface, as shown in the screens above, can also be controlled using the BlackPearl API and SDKs. So for example, not only can a bucket be programmatically created, but the ACLs on the bucket can also be programmatically set.

Coming Up Next

In my next blog post, I’ll talk about how developers will be able to use the new BlackPearl 3.0 features via the API and SDKs. I’ll also explain how developers that have are already built or in the process of building clients can prepare to migrate to BlackPearl 3.0.


BlackPearl 3.0 New Features Part 1: ArcticBlue and Advanced Bucket Management

On October 15, 2015, Spectra Logic announced ArcticBlue, new nearline disk solution that sits behind the BlackPearl Deep Storage Gateway. BlackPearl now provides an S3 object private cloud interface to the following storage products:

  • Spectra Logic tape libraries – BlackPearl has supported archive to tape libraries since its original release
  • ArcticBlue – ArcticBlue is a new nearline storage target for BlackPearl. Read more about ArcticBlue
  • Spectra Online Disk – Spectra Online Disk with Enterprise SAS drives are also a new storage target for BlackPearl

As part of the ArcticBlue release in December, we will also be releasing the next major software version of BlackPearl, Version 3.0 (we are skipping 2.0 to get BlackPearl and Verde on a common code base release). This new version will not only include support for ArcticBlue and Spectra Online Disk, but also includes two other new major features:

  • Advanced Bucket Management – Allows data policies to be set on buckets to control how many copies and for how long objects are stored on each storage product listed above. Advanced Bucket Management is covered below.
  • Access Control Lists – Provides sophisticated permission control on objects and buckets. Access Control Lists will be covered in Part 2 of this blog post.

Advanced Bucket Management

Advanced Bucket Management (ABM) is an extremely powerful new feature provided at no additional cost in BlackPearl Version 3.0. Policies are set on buckets that determine which storage type each object in the bucket will be stored on and for how long each object will be stored on each storage type. You can see an example scenario in the diagram below. Though this is probably not a realistic scenario, it does show all the different policy options.

ABMExample

In the diagram above, Bucket 3 has a 4-copy data policy. When objects are moved to Bucket 3, a copy of the object is immediately placed in each of the four storage domains:

  • A copy will be placed on online disk for 30 days for very fast object retrieval.
  • A copy will be placed on ArcticBlue nearline disk for 2 years for fairly fast object retrieval (ArcticBlue is “power down” disk so it takes a bit longer to respond than online disk).
  • A copy will be placed in a Spectra T950 tape library with TS1150 tape drives. This copy has no expiration.
  • A copy will be placed in a Spectra T200 tape library with LTO-7 tape drives. The tapes on which this object is stored will be ejected from the library for offsite storage.

When an object stored in Bucket 3 is requested by an application, the BlackPearl knows to retrieve it from the fastest available storage domain. So if the object is being requested within the first 30 days, it will be retrieved from online staging disk. Between 31 days and 2 years, the object will be retrieved from ArcticBlue nearline disk. And after two years the object will be retrieved from tape.

When a bucket is created, it must now be assigned a data policy. In the web management interface you will be forced to choose a data policy (see below). If you create a bucket via the API/SDK, you can also assign a data policy. But if you don’t assign a data policy, the user’s default data policy will be assigned to the bucket.

NOTE: If you are upgrading from 1.x to 3.0, you will need to assign a default data policy for each user.

newBucketScreenABM

BlackPearl will ship with a number of common data policies, as shown on the screen image above. These policies are automatically created based on the hardware attached. If only tape is attached then two tape policies will be auto generated and will work for most users. However, users can create their own data policies as well. Developers will be able to manipulate nearly all aspects of data policy management via the API and SDKs. We will be providing documentation on how to do this as we get closer to the release date of BlackPearl 3.0 in December.

To support Advanced Bucket Management at the most basic level in a BlackPearl client, the client should support the ability to use multiple buckets. Having multiple buckets, as shown above, will allow for the user to choose different data policies. One policy for frequently accessed data could have one copy in Spectra Online Disk for 120 days and one copy in ArcticBlue Nearline Disk forever. A second policy could be the “One Copy Tape, One Copy Nearline Disk” which is two permanent copies, great for warm data that needs parallel access while providing genetic diversity with extremely high level of durability. This would provide users two different types of storage profiles within one platform.

You can learn more about the new features of BlackPearl by viewing the recording of our inaugural Developer Summit.

In Part 2 of this blog post, I will describe the new Access Control Lists (ACLs) feature.


New Python, C SDK Releases

We have recently posted new releases of our Python and C Software Development Kits on GitHub. Get them from our Downloads page. We have also posted new Python code samples, documentation, and installation instructions. Check out this additional Python information on our Documentation page.


BlackPearl and Checksums

Spectra Logic’s highest priority is protecting our customers’ data. BlackPearl and the tape libraries that sit behind it have a number of features to ensure that your data stays protected. We start data integrity as soon as data is ingested by BlackPearl. Client applications sending data to BlackPearl can pass in a checksum to ensure that the data arrives safely. A checksum acts as a fingerprint for the file and can be used to make sure that the file received by BlackPearl is the same file that the client thought it sent. BlackPearl accepts MD-5, CRC-32, CRC-32C, SHA-256, or SHA-512 checksums. The client provides the checksum type and value in the header of the API PUT operation:

Content-MD5: 37r//gvw/aB3GmilbcUJpg==

A checksum can also be performed using our C, C#/.NET, and Java Software Development Kits.

BlackPearl uses the checksum provided for each file to ensure that the file it received is the same as the file the client sent. If the checksum does not match correctly with the file that was received, BlackPearl will return an error (400 – bad digest) to the client. If the client does not pass in a checksum with the file, BlackPearl will automatically create a checksum for the file. By default this checksum type will be MD-5, though the checksum type can be changed on the data policy settings on the bucket (a bucket is a top-level container for objects/files in BlackPearl). BlackPearl then stores the checksum, whether generated by the client or BlackPearl itself, both in its object database and with the file on tape.

Once stored by BlackPearl, the checksum can be used internally to make sure the file is still valid when it is recalled back from tape. BlackPearl will automatically do a checksum if the file is requested by a client (GET) and the file must come from tape. The checksum is done both as the file is coming from tape and after it has landed on the BlackPearl cache. BlackPearl will also provide the checksum value to the client so that the client can verify that it successfully received the file as well.

In some cases, when using the Bulk PUT operations, the client may be required to break the object into multiple “blobs” to upload it to BlackPearl. In this case, a checksum will be used for each blob uploaded to BlackPearl. The same is true when using Multi-Part Upload – each part of the file will have its own checksum. BlackPearl also supports Partial-File Restore, which is the ability to restore (GET) part of a file. With Partial-File Restore, a client can specify, for example, that it wants to retrieve the first 2GB of a 10GB file. In order to perform a checksum in this case, BlackPearl must first retrieve the entire file (or chunk) from tape. Once BlackPearl has completed the checksum on the file or chunk, it can send the partial file to the client.

Because BlackPearl may not always calculate the checksum for an entire file (because it may be broken up into multiple pieces), developers may want to have their clients calculate the entire file checksum itself. This value could then be stored in the file’s metadata when uploaded to BlackPearl. When the file is later retrieved, the client could calculate the checksum again and compare the values.