I'm uploading a CSV file compressed with gzip to Azure Blob Storage using C# .NET 8. I can read the file correctly with .NET code, but I face two issues when downloading it:
- When I download the file to my local Windows laptop, opening it results in the error "Windows cannot open the file (Archive is invalid)".
- The file size in Blob Storage is reported as 4.9 KiB, but the downloaded file is 12 KB (far exceeding the expected conversion from KiB to KB).
Additionally, when I try to process this file using Azure Databricks, it does not recognize it as a valid gzip file, although using a gzip file generated in Windows and then uploading it works fine. Interestingly, I can open and view the file data in Azure Blob Storage Explorer without issues.
Below is the code I use to upload the file:
public async Task SaveAsync
(
IEnumerable<MyData> data,
string containerName, string blobName,
CancellationToken cancellationToken
)
{
using var ms = new MemoryStream();
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
await containerClient.CreateIfNotExistsAsync();
var blobClient = containerClient.GetBlobClient(blobName);
await using var compress = new GZipStream
(
ms,
CompressionMode.Compress,
true
);
await using var writer = new StreamWriter(compress);
await using var csv = new CsvWriter
(
writer,
CultureInfo.InvariantCulture,
true
);
csv.Context.RegisterClassMap<MyData>();
await csv.WriteRecordsAsync(data.OrderBy(x => x.Date), cancellationToken);
await writer.FlushAsync(cancellationToken);
await ms.FlushAsync(cancellationToken);
ms.Position = 0;
var blobHttpHeader = new BlobHttpHeaders
{
ContentType = "application/csv",
ContentEncoding = "gzip",
};
IDictionary<string, string> metaData = new Dictionary<string, string>();
metaData.Add("date", DateTime.UtcNow.ToString(CultureInfo.InvariantCulture));
await blobClient.UploadAsync
(
ms,
blobHttpHeader,
metaData,
null,
null,
null,
default,
cancellationToken
);
}
I suspect that the gzip stream might not be getting properly finalized (i.e., not flushing its footer) before the MemoryStream is rewound and the file is uploaded. Wrapping the gzip stream and its associated writers in properly nested using blocks (or ensuring they are disposed) could solve the issue. Any insights on ensuring the stream is completely finalized before upload would be appreciated.