Included in Spectra Logic’s BlackPearl 5.0 API is a new Stage Objects command. Most BlackPearl customers utilize tape storage, with many using tape as the final place where data will live. Meaning, over time, the data will age out of both the cache and temporary disk tier, leaving the only copy or copies of the data on tape. Due to the secure offline nature of tape, restore jobs which mount a tape cartridge will always have some latency as it will take time to retrieve a cartridge from a shelf inside of the tape library. This latency can compound into a rather long wait if one job, with a large set of objects and partial objects, spans across many tape cartridges.
This type of latency happens frequently in many environments, since their data sets can be quite large, potentially spanning tens or hundreds of tapes. For instance, in weather forecasting, weather data is gathered over time and archived daily to new tape. When a new climate model is set to be run on the supercomputer, the job requires a sub set of data from each day in the archived time period (potentially over the last 10 years), which is stored across hundreds of tapes.
If a large restore job requires mounting many tapes, this job can take a very long time to restore the data back onto the processing server or supercomputer, especially if there is a limit on drive availability. This waiting can waste time and money because data is sitting idle on expensive compute storage waiting for the rest of the data set to arrive before performing the analysis.
With the Stage Objects command, this same job can be given to BlackPearl and it will instead internally copy the data to either a disk tier, if available, otherwise it will copy the job to cache. Later, when the window of processing time is available, a normal bulk get job can restore the data from BlackPearl disk more efficiently and quickly, since the data is pulled in parallel without additional latency required mounting tapes.
Once the data is restored to disk, the staged objects will stay on the disk as dictated by the disk policy. For example, if the data is restored to ArcticBlue disk that has a 3-month retention policy policy, the data will stay on the disk for at least 3 months. If it’s restored to disk cache, it will stay on cache for as long as the cache can keep it there until it has to make room for other files.
Creating a Stage Objects job works just like creating a bulk get job, but instead of the operation being “START_BULK_GET”, it is instead “START_BULK_STAGE”.
Here is the HTTP request syntax:
PUT http[s]://{datapathDNSname}/_rest_/bucket/{bucket UUID or name}?operation=START_BULK_STAGE[&name={string}][&priority=URGENT|HIGH|NORMAL|LOW]
The payload will include some number of objects:
<Objects
<Object Name=”{string}” Length=”{64-bit integer}”
Offset=”{64-bit integer}” Version_Id=”{string}”/>
…
</Objects>
The response is same as other bulk jobs:
<MasterObjectList
Aggregating=”TRUE|FALSE”
BucketName=”{string}”
CachedSizeInBytes=”{64-bit integer}”
ChunkClientProcessingOrderGuarantee=”IN_ORDER|NONE”
CompletedSizeInBytes=”{64-bit integer}”
EntirelyInCache=”TRUE|FALSE”
JobId=”{string}”
Naked=”TRUE|FALSE”
Name=”{string}”
OriginalSizeInBytes=”{64-bit integer}”
Priority=”CRITICAL|URGENT|HIGH|NORMAL|LOW|BACKGROUND”
RequestType=”GET”
StartDate=”YYYY-MM-DDThh:mm:ss.xxxZ”
Status=”IN_PROGRESS|COMPLETED|CANCELED”
UserId=”{string}”
UserName=”{string}”>
<Nodes>
<Node EndPoint=”{string}” Id=”{string}”/>
</Nodes>
<Objects
ChunkId=”{string}”
ChunkNumber=”{32-bit integer}”>
<Object Id=”{string}” InCache=”TRUE|FALSE”
Latest=”TRUE|FALSE” Length=”{64-bit integer}”
Name=”{string} “Offset=”{64-bit integer}”
VersionId=”{string}”/>
…
</Objects>
…
</MasterObjectList>
This Staging Objects operation is available in all 5.0+ SDKs for integration into your application. We encourage developers to use this new feature to pre-stage data for users of their applications. Contact the Developer Program Team if you have any questions or need assistance.