Cloud Computing and Data Security

We cannot attribute the beginning of cloud computing to a particular person or time. It evolved with the evolution of Internet and enterprise computing. We may be able to trace its roots all the way back when Dr. Larry Roberts developed the ARPANET in 1969. (Whitman & Mattord, 2016)

While the evolution of ARPANET, to Ethernet and then to Internet happened, enterprises were discovering new ways to compute from mainframes to multi-tier computing. During the early stages of enterprise computing, enterprises were purchasing hardware and software to host internally. Though not in the form that we see today, enterprises had an early version of cloud in the form of networked mainframe systems with dumb terminals. They then slowly began to outsource their information systems to Internet Service Providers (ISPs) and Application Service Providers (ASPs).

The concept of using computing, as a utility was probably first proposed by Professor Noah Prywes of the University of Pennsylvania in the Fall of 1994 at a talk at Bell Labs. “All they need is just to plug in their terminals so that they receive IT services as a utility. They would pay anything to get rid of the headaches and costs of operating their own machines, upgrading software, and what not.” (Faynberg, Lu, & Skuler, 2016). It came to fruition when Amazon launched its limited beta test of Elastic Cloud Compute Cloud (EC2) in 2006. Meanwhile, Salesforce.com has already mastered how to deliver an enterprise application using a simple website.

The author has been involved with cloud computing since 2009. However, at that time there was no precise industry definitions or standards as we see today from National Institute of Standards and Technology (NIST). This essay is an attempt to look at where we are with cloud computing today.

What is Cloud Computing?

There are many definitions of cloud computing. However, the most commonly used one is that of National Institute of Standards and Technology (NIST) in its Special Publication 800-145. According to NIST, Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. (Mell & Grance, 2011)

NIST categorizes cloud computing into five essential characteristics, three service models, and four deployments.

Cloud Service Provider

Any organization that offers services with the essential five cloud characteristics defined by NIST and has one of the service models to support an individual or enterprise entity deploy a solution in the cloud is a Cloud Service Provider (CSP).

Cloud Service Customer

Any individual or enterprise entity that consumes the services of a Cloud Service Provider to deploy a solution as defined in NIST cloud deployment model is a Cloud Service Customer (CSC).

Cloud Service Providers can be characterized using the essential characteristics and service models defined by NIST. (Mell & Grance, 2011)

Essential Characteristics of Cloud as defined by NIST

On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.

Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).

Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.

Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.

Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

Cloud Service Models as defined by NIST

Software as a Service (SaaS). The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.

Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.

Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer can deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).

Cloud Service Providers

Some of the well-known CSPs are Amazon AWS, Google Cloud Platform, IBM Cloud, and Microsoft Azure.

Amazon AWS. The AWS pricing is a pay-as-you-go model for more than 60 of its cloud services. Customers pay only for what they use rounded to one hour. They could pay in advance and can avail Amazon’s incentives if more and more of its resources are being consumed. A CSC can estimate their cloud cost upfront using their Simple Monthly Calculator. Though most of its services have short-term contracts, Amazon offers a discount for long-term locked-in agreements with Reserved Instance (RI) especially the EC2 and RDS services. Amazon’s support billing is based on each account usage. (Amazon AWS). Amazon AWS has their own Linux version for the cloud and charges extra for Microsoft Windows.

Google Cloud Platform. Google has no contract with their pricing model. However, they offer automatic discounts for loyal customers with the sustained use of their service. They have a per-minute billing that supports “greater business agility.” Enterprise doesn’t need to estimate and pay-up for what they will be using in future. They only need to pay for the time they use Google resources and could calculate their payment using their TCO Pricing Calculator. (Kaufmann & Dolan, 2015) Google Cloud does not offer dedicated hosts.

IBM Cloud. Unlike the network of IBM mainframes that enterprises had, the new cloud allows enterprises to build cloud application on custom configured hardware on bare-metals. (Wayner, 2016) The Big Blue lets you have public cloud your way with bare metal, private instances, and even custom-configured hardware options. What stands out at IBM Cloud is its application store from where its customers can pick and choose desired applications to come up with new innovative products and services that can adapt to the market needs. (Satell, 2015) Since the IBM Cloud is offered as IaaS, its CSCs has the liberty to use its application on IBM Bluemix PaaS platform or use a third-party application. IBM Bluemix only supports Java. IBM Cloud offers two types of pricing – either (1) reserve capacity with a contract for 6 to 12 months with reduced pricing or (2) pay-as-you-go for each service utilized.

Microsoft Azure. The only CSP that offers Windows with no charge along with Linux support is Microsoft Azure. This attracts enterprises that are rooted in Microsoft technologies. (Wayner, Cloud review: Amazon, Microsoft, Google, IBM, and Joyent, 2016).  The cost of using their cloud platform could be relatively high compared to other providers, and the overall performance of the cloud is rated average. Nevertheless, its cloud is regarded as one of the clouds where enterprises could get up and run quickly with its easy to use user interface. Due to its downtimes reported in 2014, some enterprises are still waiting for its cloud to mature. (Vuaghan-Nichols, 2015).

Deployment Model

All of the above CSPs offer services through virtualization at infrastructure, platform and application level.

Public Cloud. Cloud services, as described above, when available to a customer on the Internet as a utility are called the Public cloud.

Private Cloud. While the Public cloud offers significant cost reduction, enterprises have started virtualizing their internal infrastructure, platform and applications to further reduce their cost and avoid paying an external party. These clouds are called the Private cloud.

Hybrid Cloud. Some enterprises may combine a public cloud stack (e.g., long-term data object storage in Amazon S3) with a private cloud stack to form what is known as the Hybrid cloud. They pick and choose best of both worlds to develop a solution for their customers.

Community Cloud. A cloud built to solve a particular problem in a community of consumers is called Community Cloud.

Architecture

Virtualization. Virtualization helps the CSP to save cost on space, energy, and personnel while increasing CPU utilization. In the traditional enterprise computing, the CPU is never utilized to its full capacity. Virtualization also helps in cloning the master image for testing or debugging purpose. The cloned Virtual Machine (VM) image could be sitting on the same virtualized infrastructure, while isolated from the master image, reducing hardware cost for the CSC. Isolation offered through virtualization improves the security of the virtualized image provided the virtualized environment, and the image are hardened. CSC could scale up or down the number of isolated images depending on the demand of its users, and the CSC needs to pay the CSP only for the space it uses for having the images, the bandwidth for interacting with the images, the time it used its application and infrastructure services and the energy needed to sustain the demand for its resources. CSC can do all these with minimum CSC personnel interaction because of the automation available at the CSP. CSC need not hire dedicated system administrators to host its solution as long the solution developer know how to configure its tenancy in the cloud.

Virtualization is possible using hypervisors. There are two types of hypervisors. Those that run directly on the physical machine is Type-1 or bare-metal hypervisors. Hypervisors that runs on top of an operating system are Type-2 hypervisors.

Xen hypervisors are Type-1 and are not dependent on an operating system. It can concurrently run Virtual Machines (VM) with the different operating system. Another Type-1 hypervisor is KVM that also supports a guest-operating model. Unlike Xen, KVM creates a VM as Linux process. Both Xen and KVM are open source projects. VMWare Workstation and Oracle VM VirtualBox are commercially available Type-2 hypervisors. (Faynberg, Lu, & Skuler, 2016)

Since cloud computing is heavily dependent on virtualization, the security of the virtualized environment is paramount for its tenants. Tenants need to assume that the hypervisors are always susceptible to threats from other tenants as well as from the host when they design their solutions. The same is true for the host on the other side of the coin. The host of the virtualized environment needs to ensure that their network, infrastructure, platform and software used for internal operations are not threatened. The NIST Special Publication 800-125-A provides security recommendations hypervisor deployment. (Chandramouli, 2014)

Data Network. A set of technologies that enables communication between two processes located on different computers is Data Networking.  Without the evolution of data networking, the cloud wouldn’t be where it is today. Cloud Computing leverages physical interconnection within the cloud, between any two federated clouds as well as between any computer that needs to access the cloud and the cloud itself. Operating Systems in the cloud manages these communications using Internet Protocol (IP), Multi-Protocol Label Switching (MPLS), Virtual Private Network (VPN) and Software Defined Network (SDN) leveraging appliances such as Domain Name System (DNS) with load balancing and Network Address Translation (NAT) for deceptive controls, and firewalls for resistive security controls. (Shimeall & Spring, 2014)

Database and Storage. In Cloud Computing, virtualization of infrastructure leads to virtual data centers with no well-defined physical boundaries. These virtual data centers are intended to host multiple tenants. Some virtual data centers may span into numerous physical data centers especially in hybrid clouds. Depending on the connection to the host, clouds may have three types of storage: (1) direct-attached storage, (2) network attached storage and (3) Storage Area Network (SAN). Direct-attach storage is easy to store data but challenging to share. That’s where network attached storage helps. It helps to share a file over IP network. However, network attached storage does not work well with databases. It also has storage throughput limitations due to underlying networking media. SAN addresses such problems. To enable resource pooling, storage resources need to be virtualized simplifying management tasks such as database snapshots and migration.

Amazon offers Relational Database (RDS) that allows it to run MySQL, Oracle or SQL Server. It also provides schema-less database called Amazon SimpleDB for lighter workloads. Amazon DynamoDB is its solid-state drive (SSD) backed database with high replication capability. Google Cloud SQL is a MySQL-like relational database from Google while Google BigQuery is an analysis tool for querying large data sets stored in the cloud. Microsoft offers its SQL server either in the cloud or a VM. IBM offers its IBM Cloudant, which is a NoSQL data service. It also provides its powerful DB2 in the cloud. Some of these databases are used for their own data storage purposes. Some of the CSPs (not the ones mentioned above), especially the SaaS providers have nothing but a floor and couple of laptops. Everything else is in the cloud including their data.

Applications. Applications that can be re-used again and again by consumers from the Internet are good candidates for the cloud. However, not all applications should be in the cloud. Enterprises must be very cautious about putting applications that process highly sensitive data with personally identifiable information in the cloud. Because of its multi-tenancy characteristics, there is always a chance for a tenant to see other tenant’s data. Sometimes building a private cloud is better than sending everything to the public cloud. Some enterprises use dedicated MPLS line between the public IaaS and its private cloud to leverage the computing power of the public IaaS. This hybrid model mitigates direct threats from the Internet. However, it is still susceptible to threats from the cloud tenant; maybe with a short window of opportunity.

Locations. Depending on where the tenant is, or where CSC operates its business, or where the data originates, the data contained in the cloud are subject to the laws and regulations of the countries where they are in. In the US, various states such as Massachusetts and California have privacy laws while there are markets subject to industry regulations such as Health Insurance Portability and Accountability Act (HIPAA) and Payment Card Industry Data Security Standards (PCI DSS). Canadian entities or custodians of data originating from Canada must adhere to Canadian Personal Information Protection and Electronic Documents Act (PIPEDA or the PIPEDA Act) while also complying with PCI DSS if they are processing credit cards.

Supply Chain. Depending on the deployment model, there could be multiple CSPs in the supply chain from a CSC perspective. A CSC might be using a SaaS application that is sitting on top of PaaS provided by another entity. The PaaS may be sitting on top of an IaaS that may be sitting in a different country. The data that CSC sends to the cloud is now subject to laws and regulations of two different countries and may be of multiple industries (e.g., Health and Payment Card).

Data Security

Since Cloud Computing combines multiple technologies, security needs to be approached by understanding the threats, measuring the risk and applying mitigating controls at (1) each technology stack, (2) service model level, and (3) end to end solution level.

CSP Perspective. From a CSP perspective, the confidentiality, integrity, and availability of its configuration data are paramount as it is what gives its competitive edge. This intellectual property must be protected as the most valuable data asset in the company. Enterprise also needs to ensure data security of its employees, customer, and partners. It also needs to protect intellectual properties such as research materials, code, market analysis, etc. CSP must employ access controls, data-in-transit as well as data-at-rest encryption, network segregation and zoning to protect its data. CSP need to ensure infrastructure, operating systems, databases and data stores that they use for internal operations are separate from that of their customers to reduce threats originating from the CSC. The separation must be in such a way that a CSC must not be able to see CSP assets at all. This could be done through a complete air gap between CSP internal operations assets and CSC assets.

CSC Perspective. The CSP would have to deploy security controls to protect CSC data depending on the sensitivity of the data that it sends to the cloud, the location of the owner of the data and where it will reside. These include, but not limited to, access controls, data-in-transit as well as data-at-rest encryption, network segregation and zoning to protect its data. When designing a solution for the cloud, CSC must consider all possible threats to its application and apply required mitigating controls. It must consider all threats to its data from other tenants in the cloud as well as from the host. The CSC must develop controls to protect its data assuming the attack can happen anytime.

Conclusion

The business of cloud computing has come a long way. As more and more enterprises are adopting cloud computing, the should be able to use the guidance provided by NIST and Cloud Security Alliance. While enterprises continue to perceive hosting sensitive data outside their network as a risk, the cloud service providers would continue to perfect their security controls to build the confidence of their consumers. Each of them is trying their best to protect their assets.

“Do not figure on opponents not attacking; worry about your lack of preparation.” – Sun Tzu, The Art of War

 

References

 

Leave a comment

Your email address will not be published. Required fields are marked *