Community
Abstract
Customers are migrating from traditional data centers and adopting cloud to take advantage of the agility and scalability offered by public and private cloud services.
This shift has led to a significant change in IT operations due to the dynamic and constantly changing nature of cloud infrastructure and services requiring newer monitoring approach, tools, and solutions. There is a need for monitoring at cloud-scale infrastructure.
This article provides a viewpoint for an effective cloud operations solution for modern cloud infrastructure and applications, however complex or dynamic they may be.
CloudOps and why it is important.
Cloud adoption is not enough. Once workloads are on cloud, they need to be efficiently managed to ensure that they are running optimally and securely. This is important not only for agility, scalability and resiliency expected from cloud adoption but also to optimize the cost.
Here comes cloud operations (CloudOps). It is about the practices and processes enabled by technology solutions that are used to manage cloud environments. It includes tasks such as provisioning, scaling, monitoring, and troubleshooting, ensuring performance, security, and compliance. CloudOps is different from traditional IT operations (ITOps) which is more to do with managing static IT infrastructure in data centers whereas CloudOps is about management and maintenance of IT systems that are hosted on cloud(s) across various service models (IaaS, PaaS, SaaS, BPaaS) and deployment models (Private, Public, Hybrid and Multi Clouds).
Challenges in managing Cloud with current ITOps practices.
Key Principles for a CloudOps solution
When we look at IT Operations transformation from current ITOps to nextgen CloudOps, following are the key principles of such as a solution:
Technology Implications – CloudOps Solution
Technology solution for CloudOps platform comprise of various capabilities catering to individual areas. Solution capability map below depicts the various solutions components of a NextGen CloudOps delivery platform. All these capabilities are part of the overall target cloud operating model (ToM) created for the enterprise. ToM provides a standardized cloud management and operations service across customer's on-prem and cloud environments.
Exhibit 1: CloudOps Solution Capability Map (pls refer to the end of the article for the diagram)
a) Unified Dashboard and Reporting
Getting aggregate system health with new metrices that focuses on business SLA, aggregate system health and performance trend of services rather than isolated datapoints from the hosts. For example, rather than focusing on host-level issue such as elevated CPU, focus could be on latency for web application and if that starts to surge, an action is taken immediately. Similarly, visualization of aggregate system health in timeseries graphs, Heat Maps, Host Maps, etc., are needed.
b) Monitoring and Observability
Instrumenting, gathering, and monitoring data from all aspects of the infrastructure, including compute resources, applications, and cloud services, to analyze the connections between them. This also involves making the collected metrics readily available on a centralized platform, allowing for a comprehensive understanding of the system's performance and operation for observability across diverse and distributed technology mix, business process and customer journey
c) Log Management
Integrated tagging and labelling for compute, cloud services, APIs, security, network, firewall, etc. that can help in identifying and aggregating log and metrics data and can help in identifying and resolving the problem quickly.
d) Automation
Moving from tactical to strategic automation. This includes typical runbook automation for day-to-day tasks including operational automation, shift-left tasks and scripted incident/alert resolution to activities like user access management, hotfix and patch deployment, database housekeeping, backup task management etc. This can also include automation for Event Correlation and Handling, self-healing capabilities for automatically resolving and preventing issues and moving to a everything-as-a-code environment.
e) Security and Compliance
This ensures that cloud platform always meets security and compliance requirements. This includes Security Event monitoring, Endpoint Protection, Vulnerability Checks, Threat detection, Patch and Certificate management, etc. to ensuring meeting industry standards and regulatory compliance.
f) Availability and Resiliency
Ensuring that system is always operations and can handle unexpected events. This includes implementing Early Warning Systems, Redundancy and Failover mechanism (Backup and Restore), Disaster recovery plan, Archival Systems. This also includes proactive monitoring to identify any failure points before they occur.
g) Application Services
Interfacing with application services that includes application development, change and maintenance, integration, non-functional requirement, and quality assurance and ensuring controls required for CloudOps are designed and built in the applications itself rather than a post release exercise.
h) Platform Engineering Services
It involved core platform engineering services for the cloud platform such as service design, provisioning, capacity management, service catalogue management, etc.
i) Integrated Service Management (iSM)
Implementing ITSM processes, service catalog and request fulfilment, major incident management (MIM), self-service and knowledge management to ensure that service needs are meeting overall organization's goals and objectives.
j) FinOps / Cost Management
This allows organizations to understand and baseline cloud needs, provide visibility into cloud-services related spends, ability and tools to optimize usage, implement recommendations, cost transparency with stakeholders and mechanism to charge-back or show-back costs to the lines of business (LoBs). Essentially it is monitoring and control of cloud cost from a holistic perspective. It includes defining and implementing a clear cloud cost management framework to manage cloud economics - consumption & metering, performance, optimization (initial and on a continuous basis), trend analytics and cost assignment. For details on Cloud Cost Management, you may refer to the following article https://www.finextra.com/blogposting/21026/cloud-cost-management--an-emerging-focus-area
k) BizSecDevOps
Practices and approaches that integrates the three areas of efficiency, security, and quality of the product. It includes automated solution for various areas such as CI/CD pipeline management, self-service, orchestration, change, release and deploy.
l) Ops Analytics
Monitoring and optimizing performance of cloud-based systems and applications using data and analytics tools. This includes gathering log and metrics data from various sources such as cloud native monitoring services, APM and infrastructure management tools, to gain insights into system performance trends for business SLA measurement and to make informed decisions to optimize the system.
Conclusion
CloudOps helps in realizing the benefits envisaged during cloud adoption. It involves practices and processes underpinned by technology solutions for various aspects of CloudOps. Monitoring and logging, Automation, Analytics and Security are the key pillars of such a solution.
Migrating from a traditional ITOps to CloudOps is a journey. All organizations adopting cloud, should consider CloudOps early in their cloud adoption journey so that all the operational requirements are captured, and operational controls are built into the cloud infrastructure and application getting built or migrated onto it.
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
Katherine Chan CEO at Juice
21 February
Anoop Melethil Head of Marketing at Maveric Systems
20 February
Ivan Aleksandrov CSO | Core banking, BaaS, Fintech Advisory at Advapay
18 February
Scott Dawson CEO at DECTA
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.