Last Updated: Feb 1st 2010
Block BLOBs
- Max BLOB size: 200GB
- Max Block size: 4MB
- Max # of blocks: 50,000
- BLOB doesn't need to be created before blocks can be written
- Block IDs have to be base-64 and URL encoded
- All Block IDs (not a typo, IDs) must be the same length (up to 64 bytes)
- Blocks themselves do not need to be the same length
- Updating blocks does NOT update the last modified time of the BLOB
- Block operations cannot be performed on Page BLOBs
- Block must be commited after writing (else deleted after a week)
- PutBlockList controls the order of blocks in the blob rather than the block ID or
the order in which they were put.
- MSDN says: "optimized for streaming"
- Closing the stream uploads and commits blocks. Unclear if all blocks have to be
memory before any are updated.
Page BLOBs
- Operations must align to both ends of the 512 boundary
- Max put in a single put: 4MB. Contrary docs.
- Max BLOB size: 1TB. Only charged for pages used in it.
- Page ranges can be clear through a special header rather than sending empty data.
Unclear if this is bounded by the 4MB limit (probably not).
- Cleared pages are 'no longer tracked as part of the BLOB'
- Special failure case handling needed around 'Page Blob Sequence Number'
to avoid retry issues
- Writes are automatically committed.
- MSDN says: "optimized for random read/write"
- Backing store for XDrive
- GetPageRanges to avoid downloading empty parts of sparse blobs
- GetBlob(from, to) does not have to be 512 aligned
- OpenWrite throws ‘NotImplemented’ if called on Page BLOBs. (OpenRead is fine)
- Known Bug: GetPageRanges does not set EndOffset. Update: Known bug in Azure ‘high-end’
API wrappers. REST API returns correct values.
Misc
- Leases last 1 min and apply to writes only
- Leasing not supported by the wrappers (REST only)
- Container Metadata: 8KB per container, so no use here.
- Blob Metadata: 8KB per blob (not verified, no details in MSDN). May be better than
reserving pages at the start of the file. Probably not backed up by GFS. Can be
read / written with the blobs under some circumstances
- GetBlob(from, to) over a sparse page blob would make generating summaries easy.
Might be expensive (streaming zeros)
- App probably needs an inexpensive way of finding the min and max dates for which
we have data
XDrive
- Page Blob formatted as NTFS VHF.
- Each VM limited to 8 of these drives
- Each drive limited to 1TB
- Each drive limited to one VM at a time. Snapshots can be mounted by multiple VMs
as read-only drives
Questions
- XDrive VHD? Where is the work done? (for example, if I write 10 bytes, does 10 bytes
go to the cloud or does the client download the page, write 10 bytes, upload the
page?
- Storage account limited to 100TB. While we are not charged for empty pages, do they
contribute towards this limit?
- Does OpenWrite support append?
- Efficiency of reading lots of pages at once