Wednesday, May 15, 2013

Energy use and Ecological Impact of Large scale Data Centres


The Energy use and Ecological Impact of Large scale Data Centres can be described by analysis of the concept of Energy proportion system.

50
100
0
10
50
90
Typical Operating Region
Percentage of System Utilization
Percentage of Power Usage
Power
Energy Efficiency
The above diagram shows power requirements scale linearly with the load, the energy efficiency of a computing system is not a linear function of the load; even when idle, a system may use 50% of the power corresponding to the full load. Data collected over a long period of time shows that the typical operating region for the servers at a data centre is from about 10% to 50% of the load.

This is a very important concept of resource management in a cloud environment in which load is concentrated in a subset of servers and rest of the servers are kept in standby mode whenever possible. An energy proportional system consumes on power when idle, very little power under light load and gradually more power as load increases.
            An ideal energy proportion system is always operating at 100% efficiency. Operating efficiency of a system is captured by an expression of performance per watt power. Energy proportion Network consumes energy proportional to their communication load. An example of energy proportion network is infiniband.
Strategy to reduce energy consumption is to concentrate the load on a small no. Of disk and allow others to operate in low power mode.
Another technique is based on data migration. The system uses data storage in virtual nodes managed with distributed hash table, the migration is controlled by two algorithms, a short term optimization algorithm used for gathering or spreading virtual nodes according to the daily variation of workload so that the number of active physical node is reduced to a minimum and a long term optimization algorithm, used for coping with changes in the popularity of data over a longer period i.e. a week. 
Dynamic resource provisioning is also necessary to minimize power consumption. Two critical issues related to it are i) amount of resource allocated to each application ii) placement of individual work load.


 
















Thursday, May 9, 2013

Cloud Interoperability


Since the internet is a network of network, so Inter cloud seems plausible.
A network offers high level services for transportation of digital information from a source or a host outside the network to destination, another host or another network that can deliver the information to its final destination. This transportation of information is possible because there is agreement on a) how to uniquely indentify the source and destination of the information b)how to navigate through a maze of networks  and c) how to actually transport the data between a source and a destination. The three elements on which agreement is in place are IP address, IP protocol, and transportation protocol such as TCP/UDP.
In case of cloud there are no standards for storage or processing, the clouds have different delivery models which are not only large but also open. But following the lines of internet cloud interoperability seems feasible.  Standards are required for items such as naming, addressing, identity, trust, presence, message, multicast, time, means to transfer, store and process information etc.  An inter cloud would require the development of an ontology for cloud computing and then each service provider would have to create a description of all resources and services using this ontology. Each cloud would then require an interface which may be termed as an Inter cloud Exchange to translate the common language describing all object and action included in a request originating from another cloud in terms of internal objects and action.

 The above diagram show communication among disparate clouds
The Open Grid Forum has created a working group to standardize such an interface called as the Open Cloud Computing Interface (OCCI).  The OCCI has adopted resource oriented architecture (ROA) to represent key components comprising cloud infrastructure service.  Each resource identified by a URI (Unique resource identifier) can have multiple representations that may or may not be hypertext. The OCCI working group plans to map API to several formats. Thus a single URI entry point defines an OCCI interface.
The API implements CRUD operations: Create, Retrieve, update and delete. Each is mapped to HTTP POST, GET, PUT and DELETE respectively. OCCI provides the capabilities to govern the definition, creation, deployment, operation and retirement of infrastructure services. 
Cloud clients can invoke a new application stack, manage its life cycle and manage the resources that it uses using the OCCI.  The SINA association has established a working group to address the standardization of cloud storage.  The cloud Management Data Interface (CDMI) serves the purpose. In CDMI the storage space exposed by various type of interface is abstracted using the notion of a container.   The container also serves as a grouping of the data storage in the storage space and also forms a point of control for applying data services in aggregation. The CDMI provider data object interface with CRUD semantics and can also be used to manage containers exported for use by cloud infrastructure.

The above diagram shows OCCI and CDMI in an integrated cloud environment
To achieve interoperability, CDMI provides a type of export that contains information obtained via the OCCI interface. In addition, OCCI provide a type of storage that corresponds to exported CDMI containers.  

Sunday, May 5, 2013

Cloud Storage Diversity and vendor lock in


There are several risk involved when a large organization relies solely on a single cloud provider.
A solution to guarding against vendor lock is to replicate the data into multiple cloud service providers. But this straight forward replication is costly and poses technical challenges at the same time. The overhead to maintain data consistency could drastically affect the performance of the virtual storage system consisting of multiple full replicas of the organization’s data spread over multiple vendors.
Another solution is based on extension of design principle of a RAID-5 system used for reliable data storage. A RAID 5 system uses block level striping with distributed parity over a disk array. The disk controller distributes the sequential block of data to the physical disk and computes a parity block by bitwise XORing the data block. The parity block is written on a different disk for each file when all parity is written to a dedicated disk as in RAID 4. This technique allows us to recover data after a single disk is lost.



    The above diagram show a RAID 5 controller. If disk 2 is lost file 3 still has all its blocks and the missing blocks can be for by applying XOR operation (eg: a2=a1XORapXORa3) 

The same process can be applied to detect and correct error in a single block. We can replicate this system in cloud, data may be stripped across four clouds, the proxy cloud provides transparent to data. Proxy carries out function of the RAID controller. The RADI controller allows multiple accesses to data. For example block a1, a2, a3 can be read and written concurrently.

The above diagram shows the concept of RAID 5 applied in Cloud to avoid vendor lock in

Proxy carries out function of RADI controller as well as authentication and other security related functions. The proxy ensures before and after atomicity as well as all or nothing atomicity for data access. The proxy buffers the data, possible converts the data manipulation command, optimizes the data access.  For example aggregates multiple write operation, convert data to formats specific to each cloud and so on. This Model is used by Amazon. Performance penalty due to proxy overhead is minor and cost increase is also minor.


Monday, April 29, 2013

Case Study: Amzone Cloud Storage


Amazon Web service (AWS) 2000 release based on IaaS Delivery model:
In this model cloud service provider offer an infrastructure consisting of compute and storage servers interconnected by high speed network and supports, a set of services to access theses resources. An application developer is responsible to install application on a platform of his hoice and to manage the sources provided by Amazon


Cloud Watch
AWS Management Console
S3
EBS
Simple DB
EC2 instances on various OS
SQS(Simple queue Service)
EC 2 instance
Virtual Private Cloud
Auto Scaling
The  diagram shows the Services offered by AWS are accessible from the AWS Management Console.
Applications running under a variety of operating system can be launched using EC2. Multiple EC2 instances can communicate using SQS. Several storage services are available, S3, SimpleDB, and EBS. The Cloud Watch supports performance monitoring and the Auto Scaling supports elastic resource management. The Virtual Private Cloud allows direct migration of parallel applications
 Elastics compute cloud (EC2) is a web service with the simple interface for launching instances of an application under several operating systems. An instance is created from a predefined Amazon Machine Image (AMI) digitally signed and stored in S3 or from a user defined image. The image includes the operating system, the run time environment, the libraries and the application desired by the user. AMI create an exact copy of original image but without configuration dependent information such as host name of MAC address. User can 1) launch an instance from an existing AMI image and terminate an instance 2) start and stop an instance 3) create a new image 4) add tags to identify an image 5) reboot an instance. EC2 is based on Xen Virtualization strategy. In each EC2 virtual machine or instance functions as a virtual private server. An instance specifies the maximum amount of resources available to an application, the interface for that instance, as well as, cost per hour. A user can interact with EC2 using a set of SOAP message and list of available AMI images, boot an instance from an image, terminate an image, display the running instances of a user, display console output and so on. The user has root access to each instance in the elastic and secure computing environment of EC2. The instance can be placed in multiple locations.
EC2 allows the import of virtual machine images from the user environment to an instance through a facility called VM import. It also automatically distributes the incoming application traffic among multiple instances using elastic load balancing facility.  EC2 associates an elastic IP address with an account; this mechanism allow a user to mask the failure of an instance and remap a public IP address to any instance of the account without the need to interact with the software support.
Simple Storage System (S3) is a storage service design to store large object. It supports minimal set of function: write, read, delete. S3 allows application to handle an unlimited number of object ranging in size from 1 byte to five terabyte. An object is stored in a bucket and retrieved via unique developer assigned key. A bucket can be stored in a region selected by the user. S3 maintains for each object; the name, modification time, an access control list and up to 4 kilobyte of user defined metadata. The object names are global.  Authentication mechanism ensures that data is kept secure and object can be made public and write can be granted to other user.
S3 support PUT, GET and DELET primitives to manipulate object but doesn’t’ support primitives to copy, to rename, or to move an object from one bucket to another. Appending an object requires a read followed by a write of the entire object.  S3 computes (MD5) of every object written and returned in a field called a ETag. A user is expected to compute the MD5 of an object stored or written and compare this with the ETag; if the values do not match, then the object was corrupted during transmission or storage.
The Amazon S3 SLA guarantees reliability. S3 uses standards based REST and SOAP interfaces, the default download protocol is HTTP, but Bittorrent protocol interface is also provide to lower cost high scale distribution.
Elastic Block Store (EBS) provides persistent block level storage volume for use with Amazon EC2 instances. A volume appears to an application as a raw, unformatted and reliable physical disk. The size of the storage volume ranges from 1 Gigabyte to 1 Terabyte.  The volumes are grouped together in availability zones and are automatically replicated in each zone. An EC2 instance may mount multiple volumes but a volume cannot be shared among multiple instances. The EBS supports the creation of snapshots of the volume attached to an instance and then uses them to restart an instance. The storage strategy provided by EBS is evitable for databases application, file system, and application using raw data devices.
Simple DB is non relational data store that allows developer to store and query data item via web services requests. It supports store and query function traditionally provided only by relational databases. Simple DB create multiple geographically distributed copies of each data item and support high performance web application, at the same time it manages automatically the infrastructure provisioning hardware and software maintenance and indexing of data item and performance tuning.
Simple Queue Service (SQS) is a hosted message query. SQS is a system for supporting automated workflows. It allows multiple Amazon EC2 instances to coordinate their activities by sending and receiving SQS message. Any computer connected to the internet can add or read message without using any installed software or special firewall configuration.
Application using SQS can run independently and synchronously and do not need to be developed with same technologies. A received message is ‘locked’ during processing. If processing fails the lock expires and the message is available again. The time out for locking can be changed dynamically via Change Message Visibility operation. Developer can access SQS through standard based SOAP and query interfaces. Queries can be shared with other AWS accounts. Query sharing can also be restricted by IP address and time of day.
Cloud Watch is monitoring infrastructure used by application developers, users and system administrator to collect and track metrics important for optimizing the performance of application and for increasing the efficiency of resource utilization. Without installing software a user can monitor approximately a dozen pre selected metrics and then view graphs and statistics for these metrics. Basic monitoring is free, detailed monitoring is subjected to charges
Virtual private cloud (VPC) provides a bridge between the existing IT infrastructure of an organization and the AWS cloud; the existing infrastructure is connected via a virtual private network (VPN) to a set of isolated AWS compute resources. VPC allows existing management capabilities such as security services, firewall and intrusion detection system to operate seamlessly within the cloud.
Auto scaling exploits cloud elasticity and provides automatic scaling EC2 instances. The service support grouping of instance, monitoring of instances in a group, and defining triggers, pair of cloud watch alarms and policies, which allows the size of the group to be scaled up or down.1
An auto scaling group consist of a set of instances described in a statics faction by launch configuration. When a group scales up new instance are started by using parameters of run instance EC2 call provide by launch configuration. When the group scales down the instance with older launch configuration are terminate first. The monitoring function of auto scaling services carries out health checks to ensure the specified policies. For example a user may specify a health check for elastic load balancing and auto scaling will terminate an instance exhibiting a low performance and start a new one.  Trigger uses cloud watch alarms to detect event and then initiate specific action; for example a trigger cloud detect when the CPU utilization of the instances in the group goes above 90% and then scales up the group by starting new instances.
AWS services:
1.       Elastic Map Reduce (EMR): a service supporting processing a large amount of data.
2.       Simple work flow (SWF) service: for work flow management, allows scheduling, management of dependencies and co-ordination of multiple EC2 instances.
3.       Elastic Cache: A service enabling web application to retrieve data from a managed in memory caching system rather than much slower disk based database
4.       Dynamo DB: a scalable and low latency fully managed no SQL database service.
5.       Cloud Front: a web service for content delivery.
6.       Elastic Load Balancer: Automatically distributes the incoming requests across multiple instances of the application.
7.       Elastic Bean Stalk: uses/ interacts with the services of other AWS services and handle automatically the deployment, capacity provisioning, load balancing, auto scaling, and application monitoring function.
8.       Cloud formation: allows creation of stack describing the infrastructure of an application
The AWS SLA allows the cloud service provider to terminate service to any customer at any time for any reason and contains a covenant not to sue Amazon or its affiliates for any damage   that might arise out of use of AWS.
Users have several choices to interact and manage AWS resources:
1.       The AWS management console available at http://aws.amazon.om/console/;
2.       Command link tools (aws.amazon.com/developer tools)
3.       AWS SDK libraries and tool kits for java , php, c++,object c
4.       Raw REST request


Cloud Interconnect
NAT
Internet
S3

EBS
Simple DB
AWS Storage Services
SQS
Cloud Watch
Elastic Cache
Cloud Formation
Elastic Bean Stalk
Elastic Load Balance
Cloud Front
AWS Management Console
Server Running AWS Service


EC 2 Instances


EC 2 Instances
Compute Servers
The above diagram shows AWS shows the configuration of an availability zone supporting AWS services.

Amazon offers cloud services through a network of data centres on several continents. In each region there are several availability zone interconnected by high speed network. Region do not share resources and communicate through the internet.
 An availability zone is data center consisting of a large number of servers. A server may run multiple instances, started by one or more user; an instance may run multiple virtual machine or instances, started by one or more users. An instance may use storage service as well as other services provided by AWS. Storage is automatically replicated within a region. S3 buckets are replicated within an availability zone and between the availability zones of a region, while EBS volumes are replicated only within the same availability zone. An instance is a virtual server. The user chooses the region and the availability zone where this virtual server should be placed and also selects from a limited menu of instance type the one which provide the resources, CPU cycles, main memory, secondary storage, communication and IP bandwidth needed by the application. When launched, an instance is provide with a DNS name, this name maps to a private IP address for internal communication within the internal EC2 communication network and a public IP address for communication outside the internal network of Amazon. Example for communication with the user that launches the instance. NAT map external IP address to internal ones. The public IP address assigned for lifetime of an instance and it is returned to pool of available public IP address when the instance is stopped or terminated. An instance can also request for a elastic IP address which is static public IP address and need not be released when the instance is stopped or terminated and must be released when no longer needed. EC2 instance system offers several instances types: Standard instance, High memory instance, High CPU instance, Cluster computing.












Sunday, April 28, 2013

Storage as a Service


Let’s start with a real world example to get a clear understanding about it.
Sandeep started Home delivery of pizzas on the order from Customers. He is worried about how to deliver the pizzas on time and how to manage the pizza delivery. He is also concerned about hiring the drivers and keeping the bikes running. But the below scenario makes him worried.
He is a pizza delivery person and not a mechanic - delivery is a distraction. With his limited budget, maintaining the deliveries costs a lot.  His business suffers when he can't keep up the deliveries as he has promised to give 20% of deduction on total amount of the bill, whenever the delivery is late. It is difficult for him to estimate about how many bikes are required based on deliveries. Then he came to know about the new kind of Delivery Company of bikes and drivers. He can use the services provided by this company, on as need basis and pay only for what he uses. Now he can depend on that company to handle the deliveries.
A few years later his business expanded to more branches and started with new problems as well. It is difficult for him to maintain all the software’s and data in local servers which costs a lot. His business is constantly running out of storage space and fixing broken servers. Then he came to know about cloud computing, which gives a solution to his problem. Instead of maintaining all the data and software’s at local machines, he can now depend on another company which provides all the features that he requires and are accessible through the Web. At the same time his data is secure, backed up at another location. “This is using Storage as a Service which is one of the delivery models of Cloud Computing”.
Cloud computing means procuring and maintaining software/hardware resources don't have to limit business. Business can grow without worry as Cloud companies have nearly unlimited storage and resources, as well as information is safe because the servers are normally backed up in multiple locations. Sandeep is happy as he is using cloud computing and he pays only for what he uses. If his business grows, then his computing costs will also go high and vice versa.