Tópico: [EN] From Borg to Kubernetes
03-03-2016, 20:54 #1
[EN] From Borg to Kubernetes
March 2, 2016
Volume 14, issue 1
Borg, Omega, and Kubernetes
Lessons learned from three container-management systems over a decade
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes, Google Inc.
Though widespread interest in software containers is a relatively recent phenomenon, at Google we have been managing Linux containers at scale for more than ten years and built three different container-management systems in that time. Each system was heavily influenced by its predecessors, even though they were developed for different reasons. This article describes the lessons we've learned from developing and operating them.
The first unified container-management system developed at Google was the system we internally call Borg.7 It was built to manage both long-running services and batch jobs, which had previously been handled by two separate systems: Babysitter and the Global Work Queue. The latter's architecture strongly influenced Borg, but was focused on batch jobs; both predated Linux control groups. Borg shares machines between these two types of applications as a way of increasing resource utilization and thereby reducing costs. Such sharing was possible because container support in the Linux kernel was becoming available (indeed, Google contributed much of the container code to the Linux kernel), which enabled better isolation between latency-sensitive user-facing services and CPU-hungry batch processes.
As more and more applications were developed to run on top of Borg, our application and infrastructure teams developed a broad ecosystem of tools and services for it. These systems provided mechanisms for configuring and updating jobs; predicting resource requirements; dynamically pushing configuration files to running jobs; service discovery and load balancing; auto-scaling; machine-lifecycle management; quota management; and much more. The development of this ecosystem was driven by the needs of different teams inside Google, and the result was a somewhat heterogeneous, ad-hoc collection of systems that Borg's users had to configure and interact with, using several different configuration languages and processes. Borg remains the primary container-management system within Google because of its scale, breadth of features, and extreme robustness.
Omega,6 an offspring of Borg, was driven by a desire to improve the software engineering of the Borg ecosystem. It applied many of the patterns that had proved successful in Borg, but was built from the ground up to have a more consistent, principled architecture. Omega stored the state of the cluster in a centralized Paxos-based transaction-oriented store that was accessed by the different parts of the cluster control plane (such as schedulers), using optimistic concurrency control to handle the occasional conflicts. This decoupling allowed the Borgmaster's functionality to be broken into separate components that acted as peers, rather than funneling every change through a monolithic, centralized master. Many of Omega's innovations (including multiple schedulers) have since been folded into Borg.
The third container-management system developed at Google was Kubernetes.4 It was conceived of and developed in a world where external developers were becoming interested in Linux containers, and Google had developed a growing business selling public-cloud infrastructure. Kubernetes is open source—a contrast to Borg and Omega, which were developed as purely Google-internal systems. Like Omega, Kubernetes has at its core a shared persistent store, with components watching for changes to relevant objects. In contrast to Omega, which exposes the store directly to trusted control-plane components, state in Kubernetes is accessed exclusively through a domain-specific REST API that applies higher-level versioning, validation, semantics, and policy, in support of a more diverse array of clients. More importantly, Kubernetes was developed with a stronger focus on the experience of developers writing applications that run in a cluster: its main design goal is to make it easy to deploy and manage complex distributed systems, while still benefiting from the improved utilization that containers enable.
This article describes some of the knowledge gained and lessons learned during Google's journey from Borg to Kubernetes.
03-03-2016, 21:02 #2
Kubernetes Container Juggling Reaches Towards Hyperscale
March 3, 2016
Timothy Prickett Morgan
Putting legacy monolithic applications into production is like moving giant boulders across the landscape. Orchestrating applications coded in a microservices style is a bit more like creating weather, with code in a constant state of flux and containers flitting in an out of existence as that code changes, carrying it into the production landscape and out again as it expires.
Because container deployment is going to be such an integral part of the modern software stack – it basically creates a coordinated and distributed runtime environment, managing the dependencies between chunks of code – the effort and time that it takes to deploy collections of containers, usually called pods, is one of the bottlenecks in the system. This is a bottleneck that Google and its partners in the development of the Kubernetes container controller are well aware of, and as they have explained in the past, the idea with Kubernetes is to get the architecture and ecosystem right and then get to work to scale the software so it is suitable for even the largest enterprises, cloud builders, hyperscalers, and even HPC organizations. The Mesosphere distributed computing platform can also be used to manage containers (as well as have Kubernetes run atop it) and is working to push its scale up and to the right, too.
One of the reasons that containers were long since used at hyperscalers like Google and Facebook is that a container is, by its very nature, a much more lightweight form of virtualization than a virtual machine implemented atop hypervisors such as ESXi, Hyper-V, KVM, or Xen. (We realize that some companies will use a mix of containers and virtualization, particularly if they are worried about security and want to ensure another level of isolation for classes of workloads. Google, for instance, uses homegrown containers that predate Docker for its internal applications, but it puts KVM inside of containers to create instances on the Google Compute Engine public cloud and then allows this capacity to be further diced and sliced using Docker containers and the Kubernetes controller.)
Ultimately, whether the containers run on bare metal or virtualized iron, the schedulers inside of controllers like Kubernetes need to scale well, not only being able to fire up containers quickly but also to be able to juggle large numbers of containers and pods because microservices-style applications will require this.
Stack ‘Em High
CoreOS has put Kubernetes at the heart of its Tectonic container management system, and is keen on helping both the Kubernetes container scheduler and its own key/value data store, which is called etcd and which is used as a pod and container configuration management backend. Last fall, Google showed how Kubernetes could scale to around 100 nodes with 30 Docker containers per node, and the CoreOS team, working with other members of the Kubernetes community, has been able to ratchet up the scale by a factor of ten and is looking to push it even further. CoreOS engineers have just published a performance report outlining how it was able to goose the scheduler so it could load up 30,000 container pods in 587 seconds across 1,000 nodes, which is a dramatic improvement in scale from the 100 nodes running 3,000 pods that Google showed off last fall that took 8,780 seconds to schedule. That is a factor of 10X improvement in the scale of the scheduler and a factor of 150X lower latency in the time to schedule the container pods. These performance metrics are based on the Kubemark benchmark test, which was introduced recently by the Kubernetes community.
These kinds of performance jumps are what is needed for Kubernetes to compete against Mesos, which is also being positioned as a container management system and are precisely what the Kubernetes community said was in the works from the get-go.
“The focus of the Kubernetes community has been to lay down a solid foundation,” Brandon Philips, chief technology officer at CoreOS, tells The Next Platform. “There was some low hanging fruit that we could grab to increase the performance of the scheduler and I think there are some easy wins that we inside of Kubernetes to increase the scale over time. But like any complex distributed system, there are a lot of moving parts and so the initial work we did focused on the Kubernetes scheduler itself, and we are pretty happy with the performance of 1,000 nodes with 30 pods per node and scheduling all of that work in under a minute.”
Philips says that the impending Kubernetes 1.2 release should include the updates have been made to improve the performance of the scheduler. The next focus for the team will be on the etcd configuration database caching layer, which is based on a key/value store created by CoreOS for its Tectonic container management system that has also been picked up by Google as the backend for its Kubernetes container service on its Compute Engine public cloud.
Phillips says that in a typical Kubernetes setup, the control plane part of Kubernetes stack is usually run on three to five systems, with the worker machines that actually run Docker containers ranging from a few hundred to a thousand nodes. The focus now is reducing the memory and CPU requirements on those control plane servers, and also reducing the chatter between Kubernetes and the etcd backend. One way this is being accomplished is by replacing the JSON interfaces between the two bits of code with protocol buffers, which also helps cut down on CPU and memory usage in the control plane. The upstream Kubernetes code should get these improvements with the 1.3 release in three months or so, and then will cascade down to Tectonic, Google Cloud Platform, and other Kubernetes products soon thereafter. (Univa, which sells its popular Grid Engine cluster job scheduler for HPC environments, has a modified version of its tools that brings together Docker containers, Red Hat’s Atomic Host, Kubernetes, and elements of Grid Engine to create a stack Univa calls Navops, which presumably will also have very high performance.)
Having pushed Kubernetes and etcd so they can scale to 1,000 nodes, CoreOS is now figuring out how to scale this up so it can support 2,000 to 3,000 machines. The tweaks to the etcd layer will go into alpha testing in April with etcd V3, work that the CoreOS team has been doing for the past six months or so. These later round of enhancements are being targeted for the Kubernetes 1.3 release, and should scale a cluster to 60,000 to 90,000 container pods.
With Kubernetes scale now ramping fast, the question now is how fast will scalability ramp and how far does it actually need to scale for most enterprise customers? Even Google’s clusters, which are arguable the largest in the world, average around 10,000 nodes, with some being as large as 50,000 or 100,000 in rare cases.
“I think that once we get to 5,000 to 10,000 nodes, we are to the point where most people will be in a discomfort zone and they will want to shard the control plane and not have one gigantic API in charge of the infrastructure,” says Philips. “By the end of the year, I think we will be able to get Kubernetes to that point, and it will have sufficient scale for the vast majority of use cases.”
The sharding of the control plane for very large scale Kubernetes clusters is being undertaken by the Ubernetes project, and in an analog to what Google has done internally, that makes Kubernetes akin to the Borgmaster cluster scheduler and Ubernetes a higher-level abstraction for management across clusters and schedulers akin to Google’s Borg.
It is important to not take that analogy above too far. Just like Kubernetes is not literally Borgmaster code that has been open sourced, Ubernetes is inspired by Borg but not literally based on it. Borg is a very Google-specific beast, while Kubernetes and its Ubernetes cluster federation overlay are aimed at a more diverse set of workloads and users.
Not Just About Scale Out
The thing to remember about Kubernetes scale is that the software engineers are not just trying to push up the number of machines that can support long-running workloads like web application servers or databases inside of containers. Scale also means decreasing the latencies in setting up workloads with Kubernetes, particularly for more job-oriented work like MapReduce and other analytics or streaming workloads.
It can take seconds to schedule such work now, and it needs to be lower so Kubernetes can juggle many short running jobs. The etcd V3 work that the CoreOS team is working on also includes some help so the Kubernetes scheduler can do more sophisticated bin packing routines, adding support for low and high priority jobs.
As for Kubernetes having multiple job schedulers like Borg does, Philips thinks that it is likely that Kubernetes will ultimately end up with a single, general purpose job scheduler but that it will get more sophisticated over time, adding features for quality of service distinctions and load balancing and live migration of workloads across the nodes in Kubernetes clusters.
04-03-2016, 08:04 #3
04-03-2016, 08:08 #4
Why We Chose Kubernetes over AWS ECS, Docker Swarm, Mesos DCOS & Tutum
04-03-2016, 08:14 #5
Kubernetes: 80% of the market for cluster managersRachael King
Nov 24, 2015
Google has capitalized on the growing popularity of so-called containers, which are standardized building blocks of code that easily can be moved around the Internet and across a broad range of devices. In June 2014, as containers were taking off in the world of software development, Google open sourced Kubernetes, its technology for managing clusters of containers. Since then, Google has captured about 80% of the market for cluster managers, according to consulting firm Cloud Technology Partners Inc.
Kubernetes has been adopted by a widening range of companies including financial services firms and tire manufacturers. Google has used this software internally for years to manage its own containers on a large scale.
Corporations increasingly view containers, building blocks of code that contain a small application and its dependencies and are designed to work across platforms and servers, as a way to become more agile in a market where digital natives set the pace. The technology is hot, in other words, and not something the search giant would normally want to open source, the company admitted last week at the Structure Conference in San Francisco.
“Google isn’t historically focused on open source, especially of things that are considered new and innovative,” said Eric Brewer, vice president of infrastructure at Google. “But in Kubernetes, it’s very important because we actually want the hybrid use cases and the on premise cases in the sense that people can control part of their destiny,” he said.
Kubernetes is technically a cluster manager that’s able to take containers and automatically add or delete resources. A container encloses a program (or a piece of one) in a layer of software that connects seamlessly to the operating system and other computing resources. One advantage is that it can be moved easily from one computer or server to another. If traffic to a certain application spikes, Kubernetes is able to automatically replicate containers and expand capacity without manual intervention. The software can schedule containers, allocate them and make sure the computing environment has enough memory, disk space and storage, David Linthicum, senior vice president of Cloud Technology Partners told CIO Journal. The software also works with other clouds besides Google’s.
These small containers work well with DevOps, the process that more and more companies are using to develop software. DevOps departs from the traditional practice of massive corporate IT development projects written to management spec, which are followed by periods of testing and operation conducted by separate teams. DevOps breaks projects into smaller,quickly executed pieces, with single teams developing and testing software, gathering user feedback, and revising their work as they go.
“Instead of putting out a big monolithic application, we can build an application out of hundreds of containers that do different things,” said Mr. Linthicum. “We can put them in a cluster manager like Kubernetes and have them automatically scaled and managed,” he said.
Companies such as eBay Inc. and e-commerce business Zulily are starting to speak publicly about working with the software. Zulily, which was acquired in October by the QVC division of Liberty Interactive Corp., said one team had tried to use Docker containers in production in May 2014 but abandoned it soon after because it was complex to manage, said Steve Reed, principal software engineer of core engineering at the retailer, speaking at the OSCON conference in July. “The hardest part is operating the container, especially in a production environment,” he said.
The team began working with Docker containers again once it had access to Kubernetes and found the management was much easier. “Kubernetes is production ready, even for a deployment like ours at Zulily where we’re not talking about hundreds of nodes,” he said. Zulily CIO Luke Friang told CIOJournal in August of 2014 that the retailer works at a fast pace and has begun developing its own software because commercial software can’t keep pace with growth of its online business. Zulily confirmed that the company is working with Kubernetes software to manage Docker containers.
Docker, which makes among other things a container manager of its own called Docker Swarm, said that the third-party reports it has seen indicate that Docker Swarm is in the lead. One survey, conducted by O’Reilly Media of 138 respondents said that 38% indicated they used Docker Swarm, while 22% are using or plan to use Kubernetes. The report, released in September 2015, said that those who are using Kubernetes tended to be from larger organizations.
Kubernetes represents one part of the search giant’s effort to get serious about the enterprise cloud. Urs Hölzle, head of the Google Cloud Platform business , speaking at the same event on Wednesday predicted that Google’s cloud platform revenues could surpass its advertising revenues by 2020. A day later, the company tapped enterprise-technology veteran Diane Greene to run its cloud-computing businesses, including Google for Work, Cloud Platform and Google Apps. Mr. Hölzle will report to Ms. Greene.
04-03-2016, 08:27 #6
Kubernetes vs. CloudFoundry
KarlKFI | Stack Overflow
Sep 1 '15
As both a CloudFoundry (past) and Kubernetes (present) commiter, I'm probably uniquely qualified to answer this one.
I like to call CloudFoundry an "Application PaaS" and Kubernetes a "Container PaaS", but the distinction is fairly subtle and fluid, given that both projects change over time to compete in the same markets.
The distinction between the two is that CF has a staging layer that takes a (12-factor) user app (e.g. jar or gem) and a Heroku-style buildpack (e.g. Java+Tomcat or Ruby) and produces a droplet (analogous to a Docker image). CF doesn't expose the containerization interface to the user, but Kubernetes does.
CloudFoundry's primary audience is enterprise application devs who want to deploy 12-factor stateless apps using Heroku-style buildpacks.
Kubernetes' audience is a little broader, including both stateless application and stateful service developers who provide their own containers.
This distinction could change in the future:
- CloudFoundry could start to accept docker images (Lattice accepts Docker images).
- Kubernetes could add an image generation layer (OpenShift does something like this).
As both projects mature and compete, their similarities and differences will change. So take the following feature comparison with a grain of salt.
Both CF and K8s share many similar features, like containerization, namespacing, authentication, containerization,
- Group and scale pods of containers that share a networking stack, rather than just scaling independently
- Bring your own container
- Stateful persistance layer
- Larger, more active OSS community
- More extensible architecture with replacable components and 3rd party plugins
- Free web GUI
- Mature authentication, user grouping, and multi-tenancy support [x]
- Bring your own app
- Included load balancer
- Deployed, scaled, and kept alive by BOSH [x]
- Robust logging and metrics aggregation [x]
- Enterprise web GUI [x]
[x] These features are not part of Diego or included in Lattice.
One of CloudFoundry's competitive advantages is that is have a mature deployment engine, BOSH, which enables features like scaling, resurrection and monitoring of core CF components. BOSH also supports many IaaS layers with a pluggable cloud provider abstraction. Unfortunately, BOSH's learning curve and deployment configuration management are nightmarish. (As a BOSH committer, I think I can say this with accuracy)
Kubernetes' deployment abstraction is still in it's infancy. Multiple target environments are available in the core repo, but they're not all working, well tested, or supported by the primary developers. This is mostly a maturity thing. One might expect this to improve over time and increase in abstraction. For example, Kubernetes on DCOS allows deploying Kubernetes to an existing DCOS cluster with a single command.
Diego is a rewrite of CF's Droplet Execution Agent. It was originally developed before Kubernetes was announced and has taken on more feature scope as the competitive landscape has evolved. It's original goal was to generate droplets (user application + CF buildpack) and run them in Warden (renamed Garden when rewritten in Go) containers. Since its inception it's also been repackaged as Lattice, which is somewhat of a CloudFoundry-lite (tho that name was taken by an existing project). For that reason, Lattice is somewhat toy-like, in that it has deliberately reduced user audience and scope, explicitly missing features that would make it "enterprise-ready", features that CF already provides. This is party because Lattice is used to test the core components, without some of the overhead from the more complex CF, but you can also use Lattice in internal high-trust environments where security and multi-tenancy aren't as much of a concern.
It's also worth mentioning that CloudFoundry and Warden (it's container engine) predate Docker as well, by a couple years.
Kubernetes on the other hand, is a relatively new project that was developed by Google based on years of container usage with BORG and Omega. Kubernetes could be thought of as 3rd generation container orchestration at Google, the same way Diego is 3rd generation container orchestration at Pivotal/VMware (v1 written at VMware; v2 at VMware with Pivotal Labs help; v3 at Pivotal after it took over the project).
Kubernetes doesn't actually include a load balancer implementation yet, tho work in that direction is progressing. It provides a way to ask your cloud-provider to provide a load balancer, but only a few cloud-providers actually give you one (GCE & AWS, i think). CF gives you a load balancer by default, automatically.
As for "robust logging & metrics aggregation" I think this mostly comes down to maturity. It looks like both can integrate logs with ElasticSearch, and Kibana, but CF's solution is fully HA for both logging and metrics over a provided message bus (NATS). K8s exposes metrics, but AFAIK, there's no included or integrated handling, storage, processing or aggregation.
As of Kubernetes 1.1, Kubernetes now supports AutoScaling and HTTP path base load balancing (blog.kubernetes.io/2015/11/…) – Brendan Burns
04-03-2016, 08:33 #7
Cloud Foundry has landed on the DCOS
December 15, 2015
Mesosphere is all about promoting user choice on our Datacenter Operating System (DCOS), which is why we’re proud to be working with networking giant Huawei to bring the popular Cloud Foundry platform-as-a-service (PaaS) project to the DCOS and Apache Mesos.
Cloud Foundry, as most readers probably know, is the open source PaaS system created by VMware in 2011 and now led by Pivotal along with a large user community. While the popularity of many PaaS offerings has faded over the past few years—partially as a result of developer excitement over Docker and containers, in general—Cloud Foundry is still thriving, especially among enterprise developers.
The power of PaaS environments such as Cloud Foundry is that they manage the whole application lifecycle—from packaging to deployment to execution. Typically, a developer will hand over the source code and Cloud Foundry decides the best way to package it (using appropriate buildpacks, etc.) and then deploy and run it. With technologies like Docker Swarm and Kubernetes, on the other hand, packaging becomes the developer’s responsibility.
In a nutshell, Cloud Foundry’s secret sauce is its ability to abstract the entire application lifecycle in such a manner that, once the application is built and deployed, it can move between cloud providers. However, such an opinionated PaaS environment comes at the price of being dependent upon the rigidity of Cloud Foundry platform.
The goal of CloudFoundry-Mesos, which was originally developed by Huawei, is to make Cloud Foundry applications more scalable and to allow them to share cluster resources with other datacenter applications. Huawei is building out its cloud services division and, ultimately, wants to run all supported environments—Cloud Foundry, Kubernetes, Hadoop, Spark and other data systems—on top of Mesos.
The way CloudFoundry-Mesos works right now—in its very early stages—is to replace the native Cloud Foundry Diego scheduler with a Mesos framework, CloudFoundry-Mesos. Doing this does not affect the user experience or performance of other Cloud Foundry components, but would let Cloud Foundry applications share a cluster with other DCOS services without worrying about resource contention.
As CloudFoundry-Mesos matures, it will give DCOS users yet another option for running their cloud-native, microservice-based applications in a single, reliable environment (we already support a number of PaaS projects, including Yelp’s recently open-sourced PaaSta, as well as container-orchestration systems such as Kubernetes). Different teams, developers and business units all have their own requirements, and we can give them the freedom to choose their application runtimes, databases and other components without having to worry about managing a new infrastructure rollout, as well.
Check back for updates as the CloudFoundry-Mesos project progresses, and follow or contribute to it on GitHub here.