SDK Design Best Practices

Introduction

The purpose of this page is to provide BlackPearl client developers with guidance and tips on how to build an integration to BlackPearl using our Software Development Kits (SDKs). We strongly recommend you use our SDKs instead of using direct connection to BlackPearl’s http-based Application Program Interface (API). For additional assistance, please post questions to our Google Group. The intended audience for this page is someone developing code with the BlackPearl SDKs or APIs.

Do NOT Use Traditional Amazon S3 PUT and GET Commands

A BlackPearl client should not use the traditional Amazon S3 PUT Object and GET Object commands to move files to and from BlackPearl. The client should use the Spectra S3 (formerly DS3) Bulk Commands. The BlackPearl SDKs use these bulk commands. Use of traditional S3 PUT and GET commands will result in poor performance and/or errors when BlackPearl is used with a tape library. We will only certify integrations using the Spectra S3 commands, and we will only support certified integrations.

Group Files into “Jobs” When Archiving or Restoring

The Spectra S3 bulk commands described above group files into “jobs” when they are transferred. Grouping the files significantly increases the performance of the BlackPearl environment, especially when archiving or or restoring with tape. Every effort should be made to put as many files into a job as possible to maximize performance. Single-file jobs should be avoided.

When transferring files in a job, multiple files or file parts must be uploaded in parallel to achieve maximum performance. See the “Bandwidth and Performance” section below.

Determine SDK Approach — Helper Classes

The architecture and workflows of the existing software being integrated to BlackPearl will help determine how best to use the SDKs. Some of the SDKs include “Helper” classes that simplify file movement and optimize performance. The client should use the Helper classes if they are available and will fit in the architecture/workflow.

To determine if the Helper classes can be used, first identify which Software Development Kit (SDK) will be used. The SDKs are available in five languages – Java, .NET/C#, Python, C, and Go.

Java and .NET/C# SDKs

If the Java or .NET/C# SDKs will be used, Helper classes are available. Read on to determine if they can be used.

For the files that will be archived to BlackPearl, will the SDK have access to them on disk? In other words, can the SDK access them via some path (e.g. n:\videos or /User/john/files)? If not, can the files be placed in a temporary staging area so that the SDK can access them?

Python, C, and Go SDKs

Helper classes are not available in the Python, C, or Go SDKs at this time -- See the Python SDK Bulk PUT Example without Helper Classes, the C SDK Bulk Put Example without Helper Classes, or the Go SDK Bulk Put Example without Helper Classes.

Not Using Helper Classes? Make Sure to Use “Get Job Chunks Ready for Processing”

If the Helper Classes cannot be used, then it is very important for developers to understand the process of how files are archived and restored. Be sure to carefully read Spectra S3 Bulk Operations Overview. Of critical importance is to make sure to use the Get Job Chunks Ready for Processing command. This command will ensure that the BlackPearl cache is ready to receive the files being archived or ready to serve the files being restored. Not using this command will result in a BlackPearl integration that will fail under load.

Files Greater than the BlackPearl Chunk Size Must be “Chunked”

Files greater than the BlackPearl chunk size (default size is 64GB) must be broken up into multiple chunks for sending to BlackPearl. Read our blog post that explains chunking. In the Helper classes, this is done automatically. If the Helper classes are not being used, this chunking must be done manually. See the Python SDK Bulk PUT Example without Helper Classes for an example of chunking.

Exceptions and Unexpected Conditions

A BlackPearl client will experience exceptions and unexpected conditions. Make sure you have reviewed our information on Managing Unexpected Conditions in BlackPearl Applications.

Performance and Bandwidth

Make sure the available bandwidth is saturated and file transfer performance to BlackPearl is maximized. Ensure that the client is saturating either the network connection or the BlackPearl, whichever is less. A full-capacity standard BlackPearl can process files at a sustained rate of about 1000MB/s. To reach full throughput, the client should send many large (50MB or greater) files in a single Bulk PUT job. If the files are smaller than 50MB, they should be aggregated in a tar or zip file before being sent to BlackPearl. The job size should also be large, at least 400 GB for good performance (see our blog post about file size and job size considerations). Additionally the client needs to provide multiple simultaneous threads/streams of file transfer to meet the saturation rates of the network connection or BlackPearl; the Helper classes will do this automatically.

In order to maximize performance to BlackPearl, files or file parts must be sent in parallel. When maximizing performance, consider the speed of the network connection and how many files the client is going to be able to transfer at the same time. With a 10Gb or 20Gb network connection, 10 threads is usually adequate. With a 1Gb network connection to BlackPearl, using 3 parallel threads will likely be adequate. Using more than 3 threads with a 1Gb connection could cause network problems. The Helper classes default to 10 threads (see withMaxParallelRequests), but this value can be changed. Test performance for the environment and adjust this thread count as needed. The BlackPearl web management interface provides performance information.

If the Helper classes are not being used, the creation of multiple threads will need to be manually coded. Use a “producer/consumer” model, with the producer handing out files, and the consumers being the threads. Our current examples (like the Python Bulk PUT) are single threaded.

Provide Configurable Timeout Setting

When a client wants to send files to BlackPearl or retrieve files from BlackPearl, there is often a delay before the files can be transferred. In the case of sending files to BlackPearl, the BlackPearl cache may not be ready to receive the files due to other transfer activities. In the case of retrieving files, there may be a long queue of other retrieval jobs, or the files might reside on something like Amazon Glacier cloud storage which can potentially take hours before the files are available. Because of these file transfer delays, the BlackPearl client must be designed to periodically poll BlackPearl using the Get Job Chunks Ready for Processing API call. Typically polling will be repeated every minute until all files have been transferred. Client applications must have a user-configurable timeout setting to let the user choose how long to continue attempting to transfer the files. This setting is required to receive certification from Spectra Logic. The setting should control the number of minutes that the client should continue to try to transfer the files. We recommend a range of 0 to 1000 minutes.

Ensure Files are Safely Archived to BlackPearl before Deleting Them

If the client will delete the files from their original source location after archiving them to BlackPearl, the client should first ensure that the files have landed successfully on the BlackPearl cache. Once a file or file chunk is PUT to BlackPearl, a 200/OK response will be returned, indicating that the file has successfully made it to BlackPearl cache.

It is possible track archive jobs to total “completion”, meaning that all data has been written to the final storage targets (tape, disk, cloud, replicated sites). However, this is not recommended as it can slow the archive operation. If total job completion must be tracked, then is should be done in a manner that is independent of the next transfer. For example, if your client is going to transfer multiple batches of files, it should not wait for the previous batch to have total job completion before transferring the next batch.

Displaying Job Status and File Locations to Users

Developers may want to display job status information to the user within their application. For example, an application could show a user when an archive job has made it to BlackPearl cache and when it has made it to the final storage target(s). There are several API calls to get the status of jobs, including the Get Job call.

Users may also want to know where their files are located within the BlackPearl archive. They may want to know if the file is on tape, disk, or cloud, and they may want to know specific locations such as the exact tape on which a file resides. The Get Physical Placement call can be used to display the locations of files to users.

Object Naming

As with most cloud storage systems, when a file is uploaded to BlackPearl we call it an “object”. There are some constraints around how objects are named in BlackPearl due to it typically being connected to a tape system that stores data using the open LTFS format.

When you configure BlackPearl, you set up Storage Domain(s), which are collections of tape and/or disk partitions. If the Storage Domain includes tape partition(s), you must specify the “LTFS File Name” option for the Storage Domain. This option specifies how the file is named when it is placed on tape. There are two options for the LTFS File Name:

  • Object Name — LTFS file names use the format {bucket name}/{object name}, for example bucket1/video1.mov. Object names must comply with LTFS specification file naming rules. The file name must be 255 characters or less. If the tapes are ejected from the BlackPearl gateway and loaded into a non‐BlackPearl tape partition, the file names match the object names. The colon character (:) is not allowed in LTFS file names and therefore not allowed in BlackPearl object names. The slash character (/) is also technically not allowed in LTFS files names; however, in BlackPearl a slash is allowed in the object name and will get translated as a directory in the LTFS file system (e.g. directory1/directory2/video1.mov). The following characters are not recommended in LTFS file names or BlackPearl object names for reasons of cross-platform compatibility: control characters such as carriage return (CR) and life feed (LF), double quotation mark (“), asterisk (*), question mark (?), less than sign (<), greater than sign (>),  backslash (\), vertical line (|)
  • Object ID — LTFS file names use the format {bucket name}/{object id}, for example bucket1/1fc6f09c‐dd72‐41ea‐8043‐0491ab8a6d82. Object names do not need to comply with LTFS file naming rules. The object names are saved as LTFS extended attributes allowing any third party application to reconstruct all the data including the object names.

When building a BlackPearl client, developers should be prepared for customers to choose either LTFS File Name option, and thus should ensure that object names are LTFS compliant as described in the first option above.

Checksum Options

There are a number of options for using checksums with files in BlackPearl. Read the Checksums Blog Post for more information. On a bucket it is possible to force or enable end-to-end CRC, which will fail any job if it does not include a checksum or if the checksum verification fails. Files that are chunked must have a checksum for each chunk.

Because BlackPearl doesn’t always calculate the checksum for an entire file (because the file may be broken up into multiple pieces), developers may choose to have their client determine the checksum on its own and then upload the checksum value as a metadata value on the file. Then when the file is later downloaded/restored, the checksum can be calculated again and compared to the original value.

As an alternative to using checksums, you may want to consider using https/SSL encryption between your client and BlackPearl, because SSL also uses checksums to confirm that the file was successfully received.

Working with Ejected Tapes

At some point in your application, a user is going to request data that is only available on tapes that have been ejected from the tape library. In this case, the client application must know how to communicate this information to the user and help them determine which tapes to place back in the tape library. Read our blog post about Managing Ejected Tapes in BlackPearl.

Back Up Client Database

Most applications that integrate with BlackPearl include a database as a core part of the application. Typically these databases must be backed up on a regular basis. Spectra Logic strongly encourages application developers to send these database backups to BlackPearl as a core part of the integration. Doing the backups through BlackPearl simplifies the application for the customer and provides a highly-reliable target on which to store the backups.

The database backups should use their own bucket in BlackPearl with their own data policy. Tape should be used as the backup target if is available, otherwise the backups can be sent to ArcticBlue disk or online disk. If tape is used, the data policy should include an isolation level of “Bucket Isolation”, which means that the bucket will not share tapes with other buckets. Developers can set or allow users to choose the backup frequency and the number of backups to retain.

Installation and Operation Instructions

Applications that integrate with BlackPearl should include instructions or user guide information on how to install and operate the integration. The instructions should include any specific BlackPearl settings that are needed for the integration to work properly. The instructions should also include troubleshooting assistance.

Importing Foreign LTFS Tapes

When BlackPearl writes data to tape, it uses the open Linear Tape File System (LTFS) file format. Because of this LTFS support, BlackPearl also has the ability to import non-BlackPearl or “foreign” LTFS tapes to BlackPearl. This is useful for any customer that receives LTFS-formatted tapes from another source and wishes to read those same tapes in the BlackPearl environment. This workflow is particularly common in the Media and Entertainment industry as a way to transfer video files from one group to another. If you want to add the ability to import foreign LTFS tapes into your application, read our blog post about Importing Foreign LTFS Tapes into BlackPearl.

Partial File Restore

In ​the Media & Entertainment world​, data files have reached very large sizes,​ particularly in cases of high resolution video ​that can exceed the one terabyte in size. In order to efficiently work with very large files, the media file processing is done in sections, with​ the end-user requesting content “snippets” based on timecodes. Object storage devices​​ that are used to store very large files are typically not aware of the timecode-to-byte relationship, and have no content awareness that’s necessary to extract and create partial media files. In order to bridge the gap between time and bytes, BlackPearl has added a Partial File Restore (PFR) feature to enable the media processing application to efficiently retrieve a complete media file based on timecode offsets. If you want to add the ability to do PFR in your application, please read our blog post called Getting Started with BlackPearl Partial File Restore Integration.

Staging Objects/Rehydration

Many customers use BlackPearl to archive data to tape. Often these customers wish to pre-stage this tape-based data to BlackPearl disk so that when it is later needed it can be retrieved quickly. The Stage Objects command allow customers to do just that. Read more about this in our blog post titled Using the New BlackPearl Staging Objects Feature.