The third cloud wave: The Multi Cloud dilemma

The third cloud wave

While cloud computing is used in most companies, the cloud becomes more important to reach digital transformation.

As we go through different waves in industries, there are rising and falling technology cycles which obviously also apply to the cloud technology.

Companies are now updating their processes and upskill their teams to maintain the necessary control while cloud technologies reach new waves.

During the first wave, companies focused on moving their systems to the cloud based on IT driven initiatives and often without business involvement. The focus was on cost savings and being agile. When companies learnt that there is not a single cloud strategy, they started to differentiate towards vendors who are strong in their services and offerings, rather to depend on one vendor.

As a consequence, in the second wave, multi cloud strategies became more popular. Because cloud providers became differentiated, companies started to choose the best cloud service for their specific IT systems and applications.

This kind of general merchandise trade approach resulted in an increased getting out of cost control situation and CIO’s had difficulties to justify their total return of cloud technology investments.

The Multi Cloud dilemma

The multi cloud strategy reduced vendor dependencies, however, application portability and integration still remains a challenge.

Companies therefore started a withdrawal from the different multi cloud solutions, especially in the public cloud and moving more towards an on premises, data center and a private cloud approach.

Workloads for the private cloud are still critical and companies modernize their private cloud infrastructure in parallel with public cloud initiatives because they need to understand how much and what kind of workloads remain in the private cloud. Most private clouds are hosted and approaches are about virtualization, orchestration and control plane.

One of the major drivers supporting the third wave is the return to cost control. On one hand, companies want to maximize their return on their application investments. On the other hand, going through the digital transformation, costs of transforming those applications become much higher because of the volume and the returns do not meet the expectations, despite the promised economy of scale factor when using public clouds. However, this is not met when the applications are just shifted to the cloud based on a traditional on premises architecture. A container approach for example can be seen as an answer to a better cloud adoption.

So the question goes back to redesign the appropriate architecture which allows to achieve better cost savings and using modern cloud technologies to get the maximum from the cloud service providers.

Examples described in my previous blog series about Rising Cloud Technologies are

Distributed Cloud architecture
API-Centric SaaS
Cloudlets
Block chain PaaS
Cloud-Native architecture
Containers
Site Reliability Engineering
Edge Computing
Service Mesh
Micro Services

Through the next couple of years, legacy applications migrated to the public cloud infrastructure as a service (IaaS) will require optimization to become more cost effective.

The focus is on small and discrete functions and to scale only those business functions which are in demand instead of scaling all functions as in previous waves. The homework though is to understand the current architecture, identify silos and redundant applications and replace them for example with micro services, cloud native architectures and other concepts from the rising cloud technologies to achieve reliability while reducing complexity and costs.

Coming from a traditional IT infrastructure with the burden of existing silo architectures, data is unstructured, available in various file formats and distributed in different storages with different hierarchies. This makes it difficult to come up with a unified data model to avoid inconsistent data in the cloud. This is often found in regulated industries such as financial services where these companies depend on high complex and old legacy systems which cannot be just “lifted and shifted” to the cloud.

Companies getting the maximum from the cloud will challenge their multi cloud environment by investing only in the right cloud services which are best for the business. The leading cloud service providers will increase their portfolio by serving a subset of their services for low latency application requirements.

The cloud in your own data center

Most regulated companies such as banks, government and pharmaceutical companies still run their IT systems on onsite premises instead in a public cloud infrastructure because of security reasons to keep their data in their own data centers.

These companies are missing the advantages of the cloud technologies such as embedded machine learning, artificial intelligence and autonomous databases which reduce costs and security risks by eliminating user errors through automatization. The cloud does not work the same way as our datacenter does.

Cloud service providers are now coming with a new business model to give those companies the benefits of the cloud such as pay as you go and pay per use, rapid elasticity and the latest patches with an infrastructure run by the vendor but that physically is in the data center of the customer.

Vendors put their own cloud hardware and software in the customer data center, for example a cloud based autonomous database, which has the advantage for system and data base administrators and developers to focus on innovation instead of time consuming maintenance which exposes them to higher security risks of data breach and failures.

Users will only pay for what they use and the infrastructure is the customer’s data center and behind their firewall. The data is not travelling between a public or private cloud on the outside internet to reach the user, it is all in house.

The same business model can be leveraged for a public cloud inside an enterprise data center. The companies can then use a wider range of cloud services in their own data center including new technologies the cloud service providers have in their portfolio such as machine learning, artificial intelligence, Internet of Things and blockchain.

Conclusion

Companies using in-house cloud services from an external vendor have the advantage that the hardware, software and data are in their own data center and the vendor manages their infrastructure, patching, updates, security and technology updates through a remote connection. Using for example autonomous services provided from the vendor, the customer can benefit from machine learning instead of using 3^rd party tools and trying to integrate them into their systems. And if the performance can be increased through running the infrastructure in-house, the impact on cost savings is also beneficial.

However, the vendor must not have access to sensitive data. The vendor’s role is the same as if the customer were using a public cloud but now this cloud is physically inside the enterprise and the data does not travel on the outside internet.

It is important to understand how the cloud is different from traditional data centers. Companies need to pay attention to leverage the skills of their IT data center staff to learn new things as the cloud requires a different approach, cost model and infrastructure management than building or replicating a new data center in the cloud.

The focus should be on real value through transformation how you operate today and benefit from data analytics, automation, machine learning and artificial intelligence, becoming more agile and efficient. Whether this is achieved with public, private, hybrid or multi cloud, it does not matter. If a company wants to survive in the future, you need to transform the people and the companies’ culture.

Rising Cloud Technologies: Service Mesh

New technologies help companies to transform organizations into digital organizations. Identifying the emerging cloud technologies and understanding their impact on the existing cloud landscape can help companies to become more successful.

While some companies do not have a formal cloud strategy in place, most companies are using at least a cloud technology such as SaaS, IaaS or PaaS – whether in a private, public or hybrid cloud.

Other companies follow a multi cloud strategy since it allows them to select different cloud services from different providers because some are better for certain tasks than others. For example, some cloud platforms specialize in large data transfers or have integrated machine learning capabilities.

Most popular cloud models are the hybrid and multi cloud as of today. Seeing the first benefits of cost savings and increased efficiencies, companies focus now more on agility, speed and time to market to enable digital business success.

The new cloud capabilities increase the deployment options. Companies want the benefits of the cloud in all of their IT systems with the increased offering of cloud service providers, customers can now decide on the technology, services, providers, locations, form factors and control.

Since the digitalization journey raises new considerations and expectations, companies are now looking into technical areas to improve their cloud landscape such as the distributed cloud, API-Centric SaaS, Cloudlets, Blockchain PaaS, Cloud Native, Site Reliability Engineering, Containers, Edge Computing and Service Mesh.

Service Mesh

A service mesh controls how different parts of an application share data with each other. Unlike other communication management systems, a service mesh is a configurable and dedicated infrastructure layer directly integrated into the application. It can be used to document how well (or poorly) the various components of an application interact. In this way, communication is optimized and failures can be minimized, even as the applications grow.

Each part of an application, a “service”, is again based on other services that provide the user with the desired function. For example, if you buy a product via an e-Commerce application, you want to know if the product is in stock. So the service that communicates with that company’s inventory database needs to communicate with the product website, which in turn needs to communicate with the user’s online shopping cart. In order to increase business value, this retailer may eventually develop a service that recommends products to the user in the application. This new service communicates for these recommendations with a database of product tags, but also with the same inventory database that the product website accessed. So we are dealing with a large number of reusable moving parts.

Modern applications are often unbundled in this way, as a network of services, each service performing a specific business function. To perform its function, a service may need to request data from other services. But what happens if some of these services are overloaded with requests, such as our retailer’s inventory database? This is where Service Mesh comes in, a feature that routes requests from one service to another and optimizes the interaction of all the variable parts.

The difference between a Service Mesh and Micro Services

With a micro service architecture, developers can change the services of an application without having to deploy it from scratch. In contrast to application development in other architectures, individual micro services are built by small teams that can freely choose tools and programming languages. Micro services are basically developed independently of one another, communicate with one another and can fail individually without this leading to a complete failure of the entire application.

The basis of micro services is the communication between the individual services such as an inter-service communication. A communication logic can also be programmed into any service without a service mesh, but a service mesh becomes more and more useful as the complexity of communication increases. In cloud native applications, which are integrated into a micro service architecture, a service mesh can combine a large number of separate services into a functional application.

Sidecar Proxies

In a service mesh, requests between micro services are transmitted via proxies in a separate infrastructure layer. For this reason, individual proxies that make up a service mesh are sometimes called “sidecars” because they run in parallel to each service and not in it. Together these sidecar proxies, which are decoupled from each service, form a mesh network.

From a technical view, the sidecar proxies are assigned to the micro services and through which the entire communication is conducted. Sidecar proxies use standardized protocols and interfaces for the exchange of information. The proxies can be used to control, manage and monitor communication. The introduction of the additional infrastructure layer of the service mesh offers numerous advantages. The micro services interact securely and reliably. By monitoring the communication, the service mesh detects problems in service-to-service communication and reacts automatically.

Without a service mesh, all micro services must be programmed with inter-service communication logic, compromising the developer’s focus on business objectives. It also means that communication errors are harder to diagnose because the logic for inter-service communication is hidden in each individual service.

Each newly added service or each new instance of an existing service running in a container makes the communication environment of an application more complicated and poses an additional risk of failure. In a complex micro service architecture, it can become almost impossible to diagnose the root cause of problems without a service mesh.

This is because a service mesh captures all aspects of inter-service communication and performance metrics. Over time, data made visible by the service mesh can be applied to the rules of inter-service communication, thus improving the efficiency and reliability of service requests.

For example, when a service mesh fails, it can collect data on how long it took to successfully retry a particular service. Based on the collected downtime data, rules can then be written that determine the optimal waiting time until a new service call is made and ensure that the system is not overloaded by unnecessary retries.

The known service mesh products are Istio, Linkerd, Tetrate, Kuma, Consul, Maesh and inhouse products from cloud provides such as App Mesh from AWS.

Advantages of a service mesh

By creating an additional infrastructure layer through which all micro services communication is routed, a service mesh offers numerous advantages. All aspects of service-to-service communication can be captured, controlled and managed. Efficiency, security and reliability of the service mesh increase. In addition, services can be scaled more easily and quickly because the functionality is decoupled from the communication.

Developers can fully concentrate on programming the micro services without having to worry about the connections of the services.
The query logic shows a visible infrastructure parallel to the services, making problems easier to detect and diagnose because the service mesh detects dysfunctional services and automatically redirects requests.
The micro service architecture becomes more stable and fault tolerant because the service mesh redirects requests to non-functional services in time.
The authentication of the services and the encryption and decryption of the transmitted data by the sidecar proxies creates additional security in the service mesh.
Micro services can be seamlessly integrated into the service mesh regardless of the platform and provider used.
Traffic and load control are possible regardless of the respective cloud or IT environment.
KPI’s show possibilities for optimizing communication in the runtime environment.

Disadvantages of a service mesh

A service mesh must be understood conceptually in order to decide whether it is worthwhile for an application and which technology is the most suitable. The development team is then challenged with the complex task of configuring the service mesh, which involves not only functional but also technical effort. The components of the Control Plane and the additional service proxies that are provided to each container require additional CPU and memory resources, which in turn affect the cost of operating the cluster. The actual additional resource requirements depend on the number of requests and the service mesh product and its configuration. Depending on the service mesh product used, Istio for example needs more resources than Linkerd.

Another disadvantage of a service mesh is that the usage of the sidecar proxies can impact performance compared to direct communication of the services. Thus, latency times can increase due to the processing of the data in the proxies and can affect the end-user experience. The higher latency is caused by the additional call of service proxies for each request. Instead of a direct call between containers, two proxies – on the sender and receiver side – are now involved in a service mesh. The delay of the requests is dependent on the specific micro service system and the service mesh configuration and therefore should be tested before the service mesh is deployed to the production system.

Conclusion

A service mesh enables central control of monitoring, resilience, routing and security, which are implemented decentralized in the sidecars. It fits well into a micro service architecture and can replace API gateways and many libraries. From a vendor perspective, Istio is the most popular service mesh product and has its strengths in environments such as Kubernetes and also allows to integrate single virtual machines or containers. Kubernetes is an open source system for automating the deployment, scaling and management of container applications, originally designed by Google and donated to the Cloud Native Computing Foundation.

The required effort using a service mesh is apart from the cost and skills required of introducing new technologies the increased resource consumption and a higher latency.

If companies are using micro services, they should consider using a service mesh since it improves stability, extension, transparency and security of the applications.

Rising Cloud Technologies: Site Reliability Engineering

While some companies do not have a formal cloud strategy in place, most companies are using at least a cloud technology such as SaaS, IaaS or PaaS – whether in a private, public or hybrid cloud.

Site Reliability Engineering

How closely should software development and operation be interconnected and which control processes are required? From this question and the implementation of the answers, Site Reliability Engineering (SRE) emerged as a new service management model.

Site Reliability Engineering is a structured approach to software development that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.

In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.

What is the difference between DevOps and Site Reliability Engineering?

The difference between DevOps and SRE is while DevOps raise problems and dispatch them to Dev to solve, the SRE approach is to find problems and solve some of them themselves. The ideal SRE team includes developers with different specialties so that each developer can provide beneficial insight.

SRE is designed to give developers more freedom to create innovative and automated software solutions. By establishing reliable software systems with redundancy and safeguards in place, developers are not limited by traditional operations protocols. For example, in a DevOps team, the operations manager may need to approve each software update before it is published. In SRE, developers may be allowed to release updates as needed.

Since SRE is developer-focused, the manager of an SRE team must have development experience, not just operations knowledge. An SRE manager may actively help with software development instead of merely overseeing it.

SRE focuses on stability rather than agility and proactive engineering rather than reactive development and creates a bridge between development and operations by applying a software engineering mindset to system administration topics which also delivers services faster.

The ultimate goal for SREs is to establish a service quality from the perspective of the end customer. By continuously optimizing the control processes and automation, the human error factor should be kept to a minimum. The automatic control processes are indispensable for maintaining quality standards. This can be done by building self-service tools for user groups that rely on their services such as automatic provisioning of test environments, logs, and statistics visualization. Doing so reduces work in progress for all parties, allows developers to focus exclusively on feature development, and lets them focus on the next task to automate.

How to speed up the Software Development Life Cycle?

In every software development or standard applications implementation project, companies fail when the project is implemented without a methodology. For example, there are methodologies using international standards such as ISO/IEC 12207 for the Software Development Life Cycle (SDLC) or more specific ones such as Oracle’s Unified Method (OUM) when implementing Oracle Applications.

The experiences were positive and project teams were able to produce a high quality software that met customer expectations and reached completion. However, with increased complexity of IT landscapes and regulations, projects were delaying with completion in times and budgets.

Companies started to move to shorter development cycles and to become more agile to respond to a faster changing environment by introducing methodologies such as Scrum to break down deliverables into shorter cycles (sprints) to enable continuous improvements. However, scrum projects often ended up in chaos because of lacking leadership, teamwork and discipline. You need strong and professional change management to have commitment, courage, focus, openness and respect in those projects and it is often difficult to handle genius developers with a prima donna syndrome to adapt to those values.

Can new technology help?

Nowadays, companies rely on DevOps and agile methodologies in the cloud in order to speed up the software development process. More cooperation is visible between traditional and DevOps companies towards common standards to enable collaboration. For example in the financial industry, the Fintech Open Source Foundation (FINOS) is a community to promote open source solutions for the financial services industry by providing an independent setting to deliver software and standards that address common industry challenges and drive innovation.

Using a foundation where developers, IT experts and industry leaders agree on standards and collaboration on open source projects gives financial services companies the full advantage to use DevOps cloud platforms (i.e. Gitlab) and move from traditional SDLC to a modern, cloud based and service oriented software development life cycle with the aim to develop software more efficient and faster while keeping the high regulative and quality standards.

Choosing one DevOps platform may look risky, however, it helps to redefine development and engineering work because product owners from business, software developers, operators, test engineers, project managers etc. have access to the same information, using a common standard and have to work closely together to benefit from faster software development while maintenance cost budgets are constantly cut.

Rising Cloud Technologies: Cloud Native

While some companies do not have a formal cloud strategy in place, most companies are using at least a cloud technology such as SaaS, IaaS or PaaS – whether in a private, public or hybrid cloud.

Cloud Native

Cloud Native is about designing modern applications that embrace rapid change, large scale, and resilience, in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, micro services, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal effort.

Challenges

Applications have become increasingly complex with users demanding more functionality and response time. Users expect rapid responsiveness, innovative features, and zero downtime. Performance problems, recurring errors, and the inability to move fast are no longer acceptable.

Companies are now looking more at cloud native concepts and to get the most out of the cloud instead of lift and shift migrations.

Benefits

Cloud Native is much about speed and agility. Business systems are evolving from enabling business capabilities to weapons of strategic transformation, accelerating business velocity and growth. It’s imperative to get ideas to market immediately.

Examples

Companies such as Netflix, Uber and WeChat deploy hundreds of services in production on a weekly basis and achieve speed, agility, and scalability using Cloud Native technologies.

Since the Cloud Native approach is provisioning each instance as a virtual machine or container, you do not have a lump risk of a single server downtime.

Cloud service provider platforms support this type of highly elastic infrastructure with automatic scaling, self-healing, and monitoring capabilities.

Conclusion

Without a paradigm shift in the IT departments, which includes not only technical aspects, the path for companies to Cloud Native IT will hardly be possible.

Among the major hurdles are legacy systems, which still control the core processes in many companies. Cloud Native is not about simply moving legacy applications unchanged into the cloud using the Lift & Shift method. The alternatives before a cloud migration are modernization or replacement or new development. The solution is usually integration into a hybrid IT such as both on premises systems and Cloud Native components.

Companies want to deploy and operate in a way that fully leverages native cloud potentials.

Author Archives: Martin Dvorak

Use keywords for your search

Categories

Archive