Discover more from Flash into the future
From Cloud Computing to Sky Computing
Today I’m going to cover a paper that was mentioned in the Twitter thread from Erik Bernhardsson’s blog post, @sh_reya mentioned that Berkeley RISELab is turning into Sky Lab →
I was intrigued to check out this paper and decided to cover the major points of this paper with my own color.
Driving towards a commodity service
We have seen technologies that started with limited and proprietary access slowly become a commodity. The technologies or communication products we use today such as the phone, internet, PC hardware and more have once was only manufactured by a very few vendors with proprietary interfaces and no open standard.
We have now evolved into having all of these becoming a commodity, where we can use any phone from any service, access a website from any part of the world, because of open standards and shared agreements between vendors and government to be able to enforce a level of uniformity.
As all enterprises and software are all moving to the Cloud, the Cloud infrastructure layer becomes the defacto platform where all applications and developers run on. Some companies are spending 80% of their revenue on the Cloud alone, which for many the Cloud is as electricity.
Now looking at Cloud providers, each cloud IaaS provider even though provides a large number of similar services, has opted to implement completely different API interfaces, identity systems, pricing models, and data patterns which leads to porting from one cloud to another very difficult.
This means each developer needs to carefully choose the specific Cloud provider it wants to deploy on, as it gets increasingly difficult to port to another Cloud when desired.
Why should we abstract away from a single Cloud?
The paper argues that this inherent difficulty to move simply from one Cloud to another is inhibiting the ability for users to choose the best services and prices for their specific needs.
Taking training an ML model on Imagenet, the price and performance tradeoff differs depending on the type of instance you run in different clouds on a factor of sometimes 2-10x.
(credit from MosaicML)
Some large companies are also setting up multi-cloud deployments to also reduce catastrophic failure from a single provider (though not everyone thinks this is worthwhile).
Another interesting argument in the paper is that having each “monolith cloud” being not interoperable with each other stiffens innovation outside of those clouds. For example, for hardware accelerators (GPU, FPGA, AI accelerators) to be able to offer their devices as a cloud service, would either need to partner and convince a large cloud provider to offer them, or create their own managed cloud solution.
If there is a way to create an open standard that provides an abstraction layer for Cloud computing, then it could create a “Big Clouds” vs “The Rest”, where the Big Clouds given their size continue to offer their proprietary interfaces and all other long-tail Cloud players can become the open interconnected cloud and offer lower margins given its ability to interoperable with each other.
Moving from the Clouds, into the Sky
To move away from applications written and deployed into a single cloud, can we come up with a “Sky Computing” abstraction that allows apps to move from any cloud to another seamlessly? Or in classic Marc Benioff marketing, bringing on “No Cloud” that your software never feels like it’s locked into a Cloud…
Similar to how the internet is a fully interoperable layer between several networks, the paper proposes two major components to enable “Sky Computing” to have applications fully interoperable between Clouds (write-once, run-anywhere).
The components are a Compatibility Layer and an Intercloud Layer.
Cloud Compatibility Layer
The Cloud Compatibility Layer is an abstraction layer that abstracts away the interfaces and APIs the cloud provider provides, so applications can be built in a way that’s not directly dependent on the Cloud APIs.
Most open-source data and infrastructure and libraries projects today are already abstracted away from the Cloud specifics, so users can run these projects on any of the Cloud providers. However, each of these bespoke data applications each undergone its own significant effort to make its library portable, and there isn’t a standard stack underneath given the huge amount of possible services and APIs to integrate with. Cloud storage is a more common library that OSS libraries integrate, that understands how to read/write data from each Cloud provided blob storage.
There are certainly vendors looking to abstract away Cloud differences and expose general SDKs or APIs. Backend frameworks like Encore, Appwrite allow users to write their backend that’s Cloud agnostic when they deploy. Upbound abstracts the multi-Cloud operation and development into a single unified interface. Localstack abstracts the Cloud integrations from a developer perspective and allows users to test Cloud integrations from their laptops.
All of these are different takes on moving the Cloud specifics away from the user so they can focus on their applications instead. However, all of these are early efforts and not a well-recognized standard, where there are still many missing pieces to make a full compatibility layer work.
Once we have a compatibility layer that can allow the application to move to any Cloud, the user still need to decide which Cloud to run on themselves. The Intercloud Layer automatically unlocks this choice, by transparently moving the applications to the best provider to run a particular workload, just like how internet users aren’t tracking how the packets are routed among network zones through BGP.
The Intercloud layer should allow the user to specify how they should run the workload in terms of performance/cost/availability, but not the specific implementation details. For example, “this is a Tensorflow job, it involves data that cannot leave Germany, and must be finished within the next two hours for under a certain cost.”
The intercloud layer can be composed of three services: 1) Service Naming Scheme 2) Directory Service 3) Accounting and Charging.
The Service Naming Scheme is a similar concept to DNS, but each identity is a particular Cloud Service that can be invoked, with metadata about pricing, availability, provider name, etc.
The Directory Service is an aggregated list of Services provided by each Cloud provider that’s dynamically updated. A new application request specifying its performance, price, and availability requirements will be submitted to the Directory Service, where the Service will automatically resolve and return the best option to launch and run.
The Accounting & Charging layer aggregates the billing from each Cloud provider and understands how to collect and distribute payment for each service used.
Although there is no technical limitation around building this Intercloud abstraction layer, this hasn’t been any meaningful motivation to drive the major Cloud providers to come up with an intercloud standard, given the strong drive to differentiate themselves.
More Speculation about the Sky
Now we understand what Sky computing is, which essentially is the ability for a developer to deploy an app or workload into the “Sky”, where it decides which cloud(s) to run this on and the services and configurations it should consume.
There are a few questions that are worth asking about:
Will the Sky exists in the next few years?
It’s very hard to believe that we will have a full abstraction layer like this available in the next few years, as there are so much coverage to cover and ongoing changes on each provider. For the Sky to exist, we need everyone to be able to agree and adopt a set of standards across many layers and services, which the major cloud providers at this point have no incentive to.
IMO the closest standard we have for the Sky today is Kubernetes, which most companies today when building a multi-cloud product or service builds on, as it’s the fastest-growing infrastructure piece that all cloud providers have their own distribution. In theory, a Kubernetes pod that runs on AWS could just run on Google without any changes. Though in reality, there are many leaky abstractions that prevent Applications to be truly portable, from different kinds of authentication, Cloud config plugins, storage configurations, and more. Therefore, there is still a significant amount of work on top of Kubernetes to allow complex applications to be ported from one cloud to another.
What are further implications once the Sky exists?
The paper mentioned that one future effect if the Sky computing layers exist, is that smaller providers and vendors will choose to embrace this commodity layer to compete as a whole with the larger “stand-alone” providers. These large providers will be priced higher and innovate across the board to retain an advantage, while the Sky computing providers will offer a lower margin but differentiated and focused offerings in the catalog of services.
The interesting dynamics we see today is that most infra companies are offering multi-cloud solutions, but none of the solutions are truly embraced as the standard.
What will be the new standards in the infra stack that can continue to bring more consolidation? Docker and Kubernetes certainly brought the standard up the stack from VMs and machines, and Wasm has an opportunity to bring a new standard coming from the language and beyond.
There is an opportunity to continue to explore this new design space as we see more paradigms appear like Lakehouse that we covered, and see how this space continue to evolve.