How to pull blobs out of Archive Storage

How to pull blobs out of Archive Storage

So if you’re building a modern application, you definitely have a lot of options for storage of data, whether that be traditional database technologies (SQL, MySQL, etc) or NoSQL (Mongo, Cosmos, etc), or even just blob storage. Of the above options, Blob storage is by far the cheapest, providing a very low cost option for storing data long term.

The best way though to ensure that you get the most value out of blob storage, is to leverage the different tiers to your benefit. By using a tier strategy for your data, you can pay significantly less to store it for the long term. You can find the pricing for azure blob storage here.

Now most people are hesitant to leverage the archive tier because the idea of having to wait for the data to be re hydrated has a tendency to scare them off. But it’s been my experience that most data leveraged for business operations, has a shelf-life, and archiving that data is definitely a viable option. Especially for data that is not accessed often, which I would challenge most people storing blobs to capture data on and see how much older data is accessed. When you compare this need to “wait for retrieval” vs the cost savings of archive, in my experience it tends to really lean towards leveraging archive for data storage.

How do you move data to archive storage

When storing data in azure blob storage, the process of upload a blob is fairly straight forward, and all it takes is setting the access tier to “Archive” to move data to blob storage.

The below code generates a random file and uploads it to blob storage:

var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

            Console.WriteLine("Uploading to Blob storage as blob:\n\t {0}\n", blobClient.Uri);

            // Open the file and upload its data
            using FileStream uploadFileStream = File.OpenRead(localFilePath);
            var result = blobClient.UploadAsync(uploadFileStream, true);

            result.Wait();

            uploadFileStream.Close();

            Console.WriteLine("Setting Blob to Archive");

            blobClient.SetAccessTier(AccessTier.Archive);

How to re-hydrate a blob in archive storage?

There are two ways of re-hydrating blobs:

  1. Copy the blob to another tier (Hot or Cool)
  2. Set the access tier to Hot or Cool

It really is that simple, and it can be done using the following code:

var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

blobClient.SetAccessTier(AccessTier.Hot);

After doing the above, it will start the process of re-hydrating the blob automatically. And you need to monitor the properties of the blob which will allow you to see when it has finished hydrating.

Monitoring the re-hydration of a blob

One easy pattern for monitoring the blobs as they are rehydrated is to implement a queue and an azure function to monitor the blob during this process. I did this by implementing the following:

For the message model, I used the following to track the hydration process:

public class BlobHydrateModel
    {
        public string BlobName { get; set; }
        public string ContainerName { get; set; }
        public DateTime HydrateRequestDateTime { get; set; }
        public DateTime? HydratedFileDataTime { get; set; }
    }

And then implemented the following code to handle the re-hydration process:

public class BlobRehydrationProvider
    {
        private string _cs;
        public BlobRehydrationProvider(string cs)
        {
            _cs = cs;
        }

        public void RehydrateBlob(string containerName, string blobName, string queueName)
        {
            var accountClient = new BlobServiceClient(_cs);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

            blobClient.SetAccessTier(AccessTier.Hot);

            var model = new BlobHydrateModel() { BlobName = blobName, ContainerName = containerName, HydrateRequestDateTime = DateTime.Now };

            QueueClient queueClient = new QueueClient(_cs, queueName);
            var json = JsonConvert.SerializeObject(model);
            string requeueMessage = Convert.ToBase64String(Encoding.UTF8.GetBytes(json));
            queueClient.SendMessage(requeueMessage);
        }
    }

Using the above code, when you set the blob to hot, and queue a message it triggers an azure function which would then monitor the blob properties using the following:

[FunctionName("CheckBlobStatus")]
        public static void Run([QueueTrigger("blobhydrationrequests", Connection = "StorageConnectionString")]string msg, ILogger log)
        {
            var model = JsonConvert.DeserializeObject<BlobHydrateModel>(msg);
            
            var connectionString = Environment.GetEnvironmentVariable("StorageConnectionString");

            var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(model.ContainerName);

            BlobClient blobClient = containerClient.GetBlobClient(model.BlobName);

            log.LogInformation($"Checking Status of Blob: {model.BlobName} - Requested : {model.HydrateRequestDateTime.ToString()}");

            var properties = blobClient.GetProperties();
            if (properties.Value.ArchiveStatus == "rehydrate-pending-to-hot")
            {
                log.LogInformation($"File { model.BlobName } not hydrated yet, requeuing message");
                QueueClient queueClient = new QueueClient(connectionString, "blobhydrationrequests");
                string requeueMessage = Convert.ToBase64String(Encoding.UTF8.GetBytes(msg));
                queueClient.SendMessage(requeueMessage, visibilityTimeout: TimeSpan.FromMinutes(5));
            }
            else
            {
                log.LogInformation($"File { model.BlobName } hydrated successfully, sending response message.");
                //Trigger appropriate behavior
            }
        }

By checking the ArchiveStatus, we can tell when the blob is re-hydrated and can then trigger the appropriate behavior to push that update back to your application.

Leave a Reply

Your email address will not be published. Required fields are marked *