Writing 20 lines

Writing 20 lines

he following information is drawn from chapter 10 of the CompTIA Cloud+ Study Guide (Montgomery, 2016).

Disaster Recovery Methods and Concepts

Cloud computing operations can be amazingly complex, as you have learned throughout this book. Failures can and will happen, often when least expected. It is important for you to expect that these failures are going to happen and plan for disruptions ahead of the events so you can be ready when they occur. Some outages, such as a web server failure, are relatively easy to plan for by implementing multiple web servers behind a load balancer. With the amount of virtualization in the cloud datacenters, virtual machines can easily be moved quickly to new hardware platforms in the event of an outage. If a complete datacenter goes offline, then the complexity of recovery increases dramatically.

Business continuity is defined as the planning and preparation for a failure or outage and the steps for a business to quickly recover back to an operational state. We will focus on business continuity specifically as it pertains to cloud operations and discuss some strategies to protect against outages.

The best defense against an outage is a solid plan for recovery. When a company’s computing operations are migrated to the cloud, the cloud service provider will maintain hosting facilities that are designed and built to be highly resilient and to offer protection from service disruptions. Redundant systems for power, cooling, networking, storage, and computing are commonly implemented in the datacenter to reduce the frequency and probability of outages and to quickly recover critical systems in the event of an outage. As you move computing operations to the cloud, it still remains your responsibility to plan for and be able to recover from any disruptions in the cloud computing center. Many types of natural disasters, such as weather-related events, may cause power or communication interruptions in the datacenter. Other events that may cause disruptions are key infrastructure outages, such as power and cooling systems; cyber-attacks; virus infections; critical service or equipment suppliers going out of business; or labor disruptions.

This chapter will investigate the methods and concepts for recovering from a service disruption. You will learn the options available for proactive planning for an outage and investigate how to build resiliency into your cloud deployment. You will also learn about deploying cloud operations, with business continuity as a design requirement, which will allow your operations to quickly recover in the event of a service disruption.

High Availability and Fault Tolerance in the Cloud

Cloud computing is now a mainstream service offering that hosts many of the world’s most mission-critical applications and services. Major websites and content distribution companies host their offerings entirely in the public cloud. As the cloud has taken prominence in computing, companies have shifted from maintaining their own datacenters to the cloud, and site reliability, fault tolerance, and survivability have become a critical metric for a cloud provider.

Fault Tolerance

Fault tolerance is the ability of a service to remain available to end users in the event of a device or component failure in the system. Cloud deployments are architected with fault tolerance in mind, and the infrastructure is designed and implemented to tolerate the failure of various components and keep operating. Servers installed in the cloud datacenter will have redundant power supplies and multiple CPUs and LAN and SAN interfaces for storage. RAID is designed for fault tolerance of disk drives. The SAN can run two separate Fibre Channel fabrics that allow for each SAN to back up the other in the event of an unplanned outage. The LAN backbone in the datacenter and connecting out to the Internet or WAN will be redundant and able to forward traffic even if a switch or router goes offline. The infrastructure of the datacenter itself will usually have backup power generation systems to locally generate electricity should the primary power source fail. Redundant cooling systems are also installed to ensure that the datacenter is properly cooled at all times, with standby cooling systems available. Virtualized systems have multiple fault-tolerant designs, where a virtual machine can be moved from one server or datacenter to another in the event of a system failure. Storage that is virtualized allows for replication and backup systems in different locations for fault tolerance.

High Availability

High availability refers to the uptime of a system and is usually measured as a percentage of time the service is expected to be available. You will often see the five nine’s specification, which refers to 99.999 percent of uptime and is usually measured as the total expected downtime per year. This is only 5.26 minutes of downtime in a whole year’s time! The availability ratings usually measure unplanned downtime only and not regularly scheduled maintenance. Downtime is normally going to be defined as a user not being able to access a service, but there can be many different definitions of what downtime means. Therefore, you need to understand what the cloud provider is stating when they present their uptime numbers.

Many cloud datacenters are deployed in availability zones, which are segmented and isolated areas of a cloud provider’s operation. Each availability zone will offer redundancy and high availability services inside each zone. Devices deployed in the datacenter are often configured to be in high availability (HA) pairs. When systems are in an HA pair, such as firewalls or load balancers, a network interconnection will exist between the two devices, where the active system will be constantly updating the standby in the HA pair with state information and each system will verify the health of the other over this interconnection. In the event of a primary failure, the secondary in the pair will automatically take over and there will be no loss of service.

Routers can also be deployed in pairs for redundancy using protocols such as Virtual Router Redundancy Protocol (VRRP), which maintains a single default gateway for multiple routers. VRRP monitors the router interfaces for availability, and should one router go down, the backup router will begin forwarding traffic with no loss of service or need to change the default gateway on the servers. As you have learned throughout this book, it is you, the cloud consumer, who is ultimately responsible for your site’s availability. The different approaches outlined in this chapter can be used as a guide to achieve the level of availability required for your needs. You must determine the realistic availability requirement of your site and how much you are willing to spend for higher availability.

References

Montgomery, T. (2016). CompTIA cloud study guide: Exam CV0-001. Indianapolis, IN: John Wiley & Sons. ISBN 978-1119243229

Prompt

Cloud systems enable businesses and individuals to be connected to each other and their data at all times. Today, we have become accustomed to being connected at all times, and are hyper-aware of any loss of connectivity, even if only for a few seconds. If a business loses data or connectivity even for a few minutes, it can mean significant financial losses depending on the type of business. This puts immense pressure on cloud service providers and demands that redundancy and disaster recovery systems are in place and tested often to ensure uptime. For this discussion, research either the company you currently work for or one with which you are familiar, and identify business functions that may currently utilize cloud services (i.e., email, Sharepoint, Google Drive, Dropbox, etc). Discuss why they may use these cloud services, and address what the impacts may be if those services went down for a long period of time. Would business be able to continue? Would people within the company still be able to communicate with each other and collaborate?

Your initial primary post must be made by Wednesday night at midnight CT (2 pts). Your responses to other students must be made by Sunday night at midnight. Support your discussion post with at least one external citation and full reference in APA format.

"You need a similar assignment done from scratch? Our qualified writers will help you with a guaranteed AI-free & plagiarism-free A+ quality paper, Confidentiality, Timely delivery & Livechat/phone Support.


Discount Code: CIPD30



Click ORDER NOW..

order custom paper