Interoperable Image Storage

cubes

The idea is probably not new, bits and pieces of it exist in multiple forms for many years and likely constitute some Platonic truth that is above and beyond anyconcrete implementation. This applies to image storage systems(*), object storage systems, file archival storage systems and even block storage. There’s often a content checksum that accompanies the (image, object, file, block) as part of the corresponding metadata – checksum of the corresponding content that is used to validate the latter AND, at least sometimes, address it – that is, find its logical and/or physical location when reading/searching.

Historically those content checksums were 16bit and 32bit wide CRCs and simlar; nowadays, however, more often than not storage systems use cryptographically-secure, collision- and pre-image-attack resistant SHA-2 and SHA-3, possibly Edon-R and Blake2 as well.

Now.. example. Let’s say, Vendor A and Vendor B both store a given popular image, say, one of those Ubuntu LTS as per https://wiki.ubuntu.com/LTS. For instance. Vendor A would store this content as a file over blocks, while Vendor B would use object backend to store the same image as bunch of distributed chunks.

When storing the image, in addition to (and preferably, in parallel with) checksumming blocks – A, or chunks – B, both vendors would compute the whole-image checksum: SHA-2 and/or SHA-3 for starters (NIST’s final report provides other viable alternatives). The latter then must become the stored-image metadata – alias on the image’s name, or more exactly, a specific version of the latter. Further, users may request this crypto-alias, or aliases, to be returned as part of the operation of storing the image, or later, in “exchange” for the stored version name.

Let’s see what it gives..

  1. User can now perform
    Get(hash-type, crypto-hash alias)

    – on any vendor that supports the above, to uniquely retrieve and validate a given version of the image

  2. The like/dislike and trust/distrust feedback can be metadata-attached to the image’s cryptohash and thus becomes globally cumulative and available to all users
  3. Vendors can begin partnering to deduplicate common content, independently of their respective vendor-specific and often-proprietary backend mechanisms
  4. The content can be uniquely identified and end-to-end validated with 100% certainty

The long and short of it is that secure crypto-hash on a given content can be used for vendor-neutral  access to trusted images which can further enable cross-vendor interoperability, global internet-wide deduplication, peer-to-peer networking with torrent-like service to distribute the same trusted self-validating content, and beyond.

See related: Global Namespace for Docker