Spectra S3 Bulk Operations Overview

The first step in creating a bulk GET job is to issue a Create Bulk GET request (see page 149). The Create Bulk GET request specifies the bucket name and a list of object names.

When the objects were written to deep storage, the BlackPearl system organized them for efficient transfer both when writing them and when reading them back later. The Create Bulk GET response provides a job ID and a list of chunks, where each chunk has a list of objects or object parts. For each object part, the response provides the object name, offset, and length.

The client should then issue a Get Job Chunks Ready for Processing request (see page 204) using the job ID. The response is in the same format as the Create Bulk GET response, but it only lists chunks that are ready for the client to retrieve. If the list is empty, then the BlackPearl system provides an HTTP Retry-After header with the number of seconds the client should wait before issuing the request again.

Finally, the client should GET all of the object parts in the available chunks and repeat this process until all chunks are transferred. The client can use any level of concurrency when transferring object parts within a chunk.

The GET object operation used in the Spectra S3 API is an extended version of the Amazon S3 GET object operation. While the Amazon S3 version of the GET object request transfers a single object or a number of object parts of a single object, the Spectra S3 API version transfers up to 500,000 objects or object parts, allowing the BlackPearl system to efficiently organize each individual object or object part transfer into job chunks.

The client may need to issue multiple GET requests for a single object if it has been broken up into multiple pieces due to its large size. For example, in the Create Bulk GET request Sample Response, the BlackPearl system split the object “test.aaf” into one 100 GB part and one 50 GB part. If you want to retrieve test.aaf, you must GET the first 100 GB, specifying the job ID and an offset of 0, followed by getting the remaining 50 GB, specifying the job ID and an offset of 107374182400 (since 100 GB = 100*(2^30)).

The steps for processing a bulk PUT job are similar to the steps for a bulk GET job. The first step in creating a bulk PUT job is to issue a Create Bulk PUT request (see page 155). The Create Bulk PUT request specifies the bucket name and a list of object names and sizes.

The BlackPearl system then breaks up the objects and organizes them for efficient transfer. The Create Bulk PUT response provides a job ID and a list of chunks, where each chunk has a list of objects or object parts. For each object part, the response provides the object name, offset, and length.

The PUT object operation used in the Spectra S3 API is an extended version of the Amazon S3 PUT object operation. While the Amazon S3 version of the PUT object request transfers entire objects, the Spectra S3 API version transfers parts of an object as defined by a byte offset and a byte length. The client may need to specify multiple PUT operations per object.

For example, if you want to PUT an object that is 150 GB, the BlackPearl system splits that object into one 100 GB part (the largest object part length for a bulk PUT) and one 50 GB part. You must PUT the first 100 GB, specifying the job ID and an offset of 0, followed by putting the remaining 50 GB, specifying the job ID and an offset of 107374182400 (since 100 GB = 100*(2^30)). The offset to use is specified in the response to the Create Bulk PUT. The length of the object part you are transferring is specified by the Content-Length HTTP request header in the PUT object request.

For each job chunk, the client should issue a Get Job Chunks Ready for Processing request (see page 204) using the job ID. This will allocate a working window of job chunks, if possible, and return a list of the job chunks that the client can upload. The client should PUT all of the object parts from the list of job chunks returned and repeat this process until all chunks are transferred. Chunks must be sent by the client in order; however, objects within a given chunk may be sent in any order.

The following is a conceptual code example:

If the Get Job Chunks Ready for Processing request (see page 204) returns an empty list, then the server's cache is currently saturated and the client must wait before sending more data. The client should wait the number of seconds specified in the Retry-After HTTP response header.

To process a bulk VERIFY job, issue a Create VERIFY Job request (see page 163). The Create VERIFY Job request specifies the bucket name and a list of object names. The job reads the data for each object from the permanent data store and verifies that the CRC of the data read matches the expected CRC. No additional requests are required.