HyperConverged Storage Directions — How to get users down the path

I wanted to title this Hypo Diverged Storage, as seemingly there are many ideas of what “hyper converged” storage and architectures are in the industry. Targeted at developers and solution providers alike, this piece will strive to combine a string of ideas of where I think we want to go with our solutions, and I hope to dispel some arguments that are assumptive or techno-religious in nature. Namely
  • What is really necessary for a strong foundational technology/product
  • Emulate the service architectures
  • It just needs to be good enough
  • Containers: Just build out the minimum and rely of VMs/hypervisors to handle the rest
  • You must build fresh cloud native, immutable applications only
In my role at Stanford Electrical Engineering I have numerous interactions with researchers and vendors, including research projects that have now become the formative infrastructure companies in the industry. My hand in many of these has been slight, but we have had more effect in guiding the build out of usable solutions, taking many into production use first at our facilities. Among the many projects over time, we have dealt with the beginnings and maturation of virtualization in compute, storage, and networking. Today’s emerging on-prem and cloud infrastructure now takes as a given that we need to have all three. Lets target the assumptions.
1) To build out a production quality solution that works well in the ecosystem, you just need to make a killer (storage/compute/network) appliance or product.
In many discussions, people may notice a mantra that most companies rightfully focus on building technology from a position of strength such as storage or networking, but miss out when they only attempt to integrate with one of the other legs of the virtualization stool. They rush to market but fail to execute well or deliver what end users actually want because the integration is half baked at most. I’ve worked with great storage products that focus on network access alone, but can’t deal with bare-metal container hypervisors; SDN products that can help make OpenStack work well for tenants, but ignore the storage component that itself is on the network; Storage products that require dedicated use of network interfaces for I/O without awareness of virtualized interfaces; Hypervisors that use networks for storage but can’t themselves hide transient network failure from the guests they host. These are just samples but they inform us that its hard to get it all right.
Although its a lot for any going concern to address, it is necessary for products and solutions to have well thought out scale-up and scale-out approaches that address all three legs of the stool. One could argue the forth leg could be orchestration, and that might need to be taken into account as well. Regardless, a focus on providing an integrated storage/networking/virtualization solution in the more simple cases, and building from there is more advantageous to customers looking to grow with these solutions long term.
2) To build a successful hyper converged product, you need to emulate how Google/AWS/etc build out their cloud infrastructure. 
I’d argue that most potential customers won’t necessarily want to pick a random vendor to bring AWS-like infrastructure on site. The economics and operational complexity would likely sway them to just host their products on AWS or similar, regardless of how much performance is lost in the multiple layered services offered by Amazon. In most cases, mid-enterprise and higher customers already have various solutions adopted, thus necessitating a brown field approach to adopting hyper converged solutions.
This gets to one important belief I have. The idea is that regardless of your solution, the ability for a site to bootstrap its use with minimal initial nodes (1 or more), truly enable migration of work loads onto it, and with time and confidence, allow the customer to grow out their installation. This pushes out another assumption that the hardware is uniform. With gradual roll outs, you should expect variation in the hardware as a site goes from proof-of-concept, to initial roll out and finally full deployment. All the while, management of the solution should allow the use of existing networks where possible and the phase out of or upgrade of nodes over time. There is no “end of deployment” as it is more of a continuous journey between site and solutions vendor.
You’ll notice a theme, that bootstrapping of initial storage and networking, and simply, is more important than to emulate the full stack from the get go. Don’t try and be Amazon, just make sure you have strong foundation storage/compute/networking so that you can span across multiple installations and even over pre-existing solutions like Amazon.
3) Just get the product in the door with good initial bootstrapping.
Extending from the above, simply bootstrapping is not enough. Support of a product tends to kill many startups or seasoned companies. However, it really bedevils startups as they strive to get initial market acceptance for their investors. In the end, a hyper converged solution is a strong relationship between vendor(s) and the customer. Deployment should be considered continuous up front, so the ability to add, maintain, and upgrade a product is not only central to the relationship, it informs the product design. Consider what hyper converged truly means. Each node it self sufficient but any work load should be able to migrate to another node, and a loss of any given node should not overly strain the availability of the infrastructure.
I don’t want to take away too much from how one should focus on ease of initial deployment. That can win customers over. However, keeping users of converged infrastructure happy with the relationship requires happiness with the overall architecture and not the initial technology.
4) Containers can run well enough in virtual machines. Just build on top of VMs and then layer on top our storage and network integrations.
VMs alone are no longer the target deployment, but rather applications and/or app containers. In the rush to enable container deployment, vendors take the shortcut and build on top of existing VM infrastructure. Its worse when it comes to existing hyperconverged solutions that have heavy infrastructure requirements to support VMs. If one now sees lightweight containers as a target, ones assumptions on what makes hyperconverged infrastructure changes. Merely adopting existing mature VM infrastructure becomes is its own issue.
I love the work that Joyent has done in this space. They’ve argued well, using mature container technology not found in Linux, and brought Linux and Docker into their infrastructure. What makes them better than the present competition is that they run their containers on bare-metal and right on top of virtualized storage (ZFS backed) and maturing network virtualization. The solution presented works wonders for those operating the tech as an IAAS. Market acceptance of a non-native Linux solution or some of the present non-clustered FS components may limit the appeal of excellent technology. At the same time, the primary market as a public cloud places them in the shadow of the larger players there.
A Joyent-like technological approach married with full SDN integration could be the dream product, if made into such a product. Already, the tech proves the assumption of using VMs to be a weak proposition. What VM infrastructure has and containers need for wider adoption is SDNs, namely overlay networks. If we enable not only pooled storage but pooled networking, we can gain better utilization of the network stack, inherent path redundancy, and via an SDN overlay, a safe way to build tenant storage networks arbitrarily over multiple physical networks, perhaps across multiple vendor data centers. Think how this serves the customers potential container mobility.
5) Docker is containers / Containers need to be immutable by design for a scale out architecture
All the above leads to this point. Taking past experience and intuition, I think that what holds back adoption of next generation infrastructure throughout the industry is in fact the religious arguments around immutability. The work loads that could make use of better solutions coming to market require some level of persistence per node/container. Though live migration may not be warranted in many cases, the ability to upgrade a container, vertically scale it, or run “legacy” applications using these new operation approaches will be required of a mature container solution. We just haven’t gotten there yet. A scale-out infrastructure can allow for persistent type containers, as long as end users architect for the expected failure modes that may occur for this application profile.
What matters to customers is the ability to use the build out and scale out nature of the cloud, with known data and operational redundancy and recovery at its core. In many cases, work loads that the market presents may not yet fit into the Docker world, but they can definitely be containerized and virtualized to hyper converged storage solutions.
If you believe that these are the target users, who don’t necessarily need to develop the whole solution with a ground-up “cloud” architecture, don’t wish to run on the public cloud in all cases, and need a way to on-board cloud technology into the present organization in a progressive way, you will see where we all collectively need to go. Its not so much a hyper converged world, but an inviting, enabling solution space that starts simple (perhaps one compute+storage node), scales out non-uniformly using SDNs to tie in the nodes as they come and go. Customers can adopt small, and grow with you as you build out better, more complete infrastructure.
Forward thoughts:
My experience shows that the IT work place is changing, and those managing data centers need to adjust to adopt the skill sets of cloud operators. There is pressure I’ve seen in many organizations to convert both the business and the work force to adopt cloud technology. The opposing forces are institutional resistance (or requirements) to not adopt the public cloud, or critical work loads that can’t trivially be converted to Docker-style immutable containers. That is a present market condition needing solutions. A stand alone product won’t work, but a long term approach the grows with an organization through this IT landscape change will serve both customers and solution providers well.