New Directions in Cloud Programming
New Directions in Cloud Programming (CIDR 2021)
Today as a developer, developing applications to run in the cloud in a highly available and performant manner isn’t straightforward at all. There are many factors that can
cause downtimes (VM crashing or slowing down, underlying infra degrades, outages, etc). This has led to where we are today which is what we all know as the “cloud-native” application way. Even though we see the rise of Docker, Kubernetes making some level complexity easier, the whole toolchain is really hard to learn and a normal application developer needs to know a lot of knowledge to be able to leverage all of these frameworks well.
What if, there can be a new programming model and language, that can understand the intent of the application, and create a compiler that can generate the full application internals that understands how to be “cloud-native”? We’re seeing commercial languages that have been going down this path like Dark or Unison.
What the “New Directions in Cloud Programming” paper is attempting is not to specify a complete solution in detail, but to propose a high-level programming stack and further work needed to complete the puzzle.
Moving towards a declarative model with a gradual approach
Today’s programming languages are most imperative, which leads to a lot more control to the developers but also means each developer needs to handle a lot of the details of the service themselves.
Just like how we moved database querying to a declarative language (SQL), we should also move as much as possible to a declarative fashion when it comes to programming the cloud.
The paper described a new compiler/language stack called the Hydro, that is designed to bring on a staged approach of generating Hydroflow programs that is ultimately able to orchestrate various cloud resources to deploy the application.
At a high level, Hydro first accepts various kinds of existing DSLs such as Spark, Actor frameworks like Orleans, or Futures like Ray, which are existing programming frameworks and languages. This creates a Hydralogic IR output that is human readable like the following:
This allows developers to fine-tune or configure different aspects of a cloud application, which we will touch on the various areas of configurations that Hydrologic supports in the next section.
After the IR program is generated and configured, then it gets compiled into actual implementation.
PACT: Program semantics, Availability, Consistency, Targets For Optimization
Diving a bit deeper into the design spaces, there are four facets that it allows developers to configure in isolation. This is a pretty interesting paradigm as typically in a program a developer needs to consider multiple facets when writing each line of code.
In the Hydrologic model, it breaks down the declarative sections into these different facets so developers can declare what is level of optimization or configuration it needs for its application. Each facet has its own syntax and code generator, that performs a search-based approach when generating optimized code and not a typical rule-based approach.
The first facet which is the Program Semantics is basically moving the actual business logic of your application into declarative syntax, so the compiler can automatically optimize a lot of distributed / parallel aspects. However, most programs aren’t implemented in a declarative way, and learning a declarative distributed programming language (e.g: CALM) isn’t as simple for developers. The paper proposes to use Verified Lifting, that in summary is a way to translate a sequential language (Java, etc) or framework (Spark, etc) into a target language (which is Hydrologic in this case), by searching over possible candidate code instead of the traditional rule-based approach.
The second facet is Availability, which allows each API to describe the availability factor, such as “This API should tolerate 1 availability zone being unavailable”. One example of how the compiler uses this is it can translate this into deploying the backend into two availability zones and create a load-balancer in front that can automatically failover. It also can replicate the backend state automatically to be stored across two AZs as well.
The third facet is Consistency, which touches the consistency level (ACID isolation, invariances on a property, etc) of your application state that’s configured also on a per API basis. By lifting the consistency specification on the API level, allows code generator to be able to holistically optimize both client and server to optimally avoid coordination and improve performance. This also allows code generators to explore the various possible implementation to enforce consistency based on the data analysis, for example, exploring either leveraging CRDTs or choosing a more consensus-based approach.
The last facet is Target For Optimization, which simply states what are the cost and performance targets that the application should optimize for. This allows the code generator to consider how to target instance types choices, number of machines, and other factors that are able to both optimize for cost and performance.
The output of the code generators from these facets is another IR that can be generate a full implementation / deployments that leverages various existing Cloud resources or managed services to fulfill this goal.
Conclusion
While this paper has proposed a lot more future research areas and questions, it is really interesting to see what a possible high-level path can look like when we can move the developer’s concern into a much more high-level language.
I think moving away from all the proposed research areas, it’s also interesting to ask what is a declarative abstraction that developers will like to leverage in today’s development and infra stack. The rise of IaaC (Terraform, Pulumi, etc) has brought in a higher declarative language to be able to repeatable deploy and maintain infrastructure state.
I believe there will also be innovation that happens on the application level that also allows you to declare application logic and entities in a more declarative way. Chiselstrike (Essence VC portfolio) is one step in that direction, that allows front-end developers to declare the entities / schema and logic around the entities in a typescript library, which in turn gets compiled into a backend API and more.
It’s exciting to see how research and also today’s developer stack will continue to evolve, and also how the move to serverless will continue to happen as well.