ECEA 5372 Cloud Computing

Link to CU Boulder page

Cloud Background

AWS is one of the originators of cloud computing. It's only like 20 years old. See this article. It allowed teams within Amazon to reuse infrastructure services without reinventing the service each time. Eventually they made this a product for sale. Berkeley paper on defining cloud computing.
Cloud computing allows for 1. Elimination of an up-front commitment (don't need to buy equipment or locations) 2. Pay for use on a short term basis. With this you only pay for what's needed in the moment. 3. Illusion of Infinite Computing Resources
This model works because of the economy of scale. A provider can build a huge datacenter vs. using many small datacenters. Providers can also have expertise on running these datacenters.
Infrastructure as a service = delivering resources like compute + network + virtualization, but the customer has to manage the OS, middleware + applications etc.
Platform as a Service = Delivers and manages all the hardware and software resources to develop applications through the cloud. Developers and IT operations can use PaaS to develop and run and manage apps without having to build and maintain where or how it works.
Software as a service = Provides the entire application stack. This could be something like Salesforce (the entire CRM)
Trade capital expense for variable expense. Benefit from massive econmoies of scale. Sto guessing about capacity. Increase speed and agility. Go global in minutes.
Using GCP in this class.
Suggested Readings. Microsoft paper on VL2. VL2 is a way of running a network inside of a datacenter. It uses valiant load balancing (randomness in connections) to be able to use as little networking capacity as possible while still retaining reliability. This means it could bounce off of any particular swtich to get to an end destination. Traffic patterns are highly volatile and unpredicable. It can be complicated to manage L2/L3 if you use traditional networking while allowing people to provision networks/machines dynamically. VL2 creates an illusion of a huge L2 switch so the user doesn't have to manage a swtich. Instead of using ARP it will use a shorthand based on location of server with known values. It then places its physical location in a discovery service which can be used to find a particular MAC address. ARP broadcasts over 10000+ physical machines is not feasible.
Virtualization decouples the OS/service from the hardware. You can often share multiple services on a common host hardware. Can migrate from host to host as needed. This eliminates underutilization of bare metal servers, allows running different operating systems on the same host, and allows applications to have conflicting dependencies if they are in different virtualization units.
A virtual machine provides an interace that is identical to the underlying bare hardware. The Virtual Machine Operating system (hypervisor) creates the illusion of multiple processors. Virtual machines can be heavy on the CPU/Memory/Disk because of the complete copy of an OS.
Containers allow using the same operating system between different instances. This menas you only need the app + libs/bins to run and the operating system will do the "virtualization"
Some Pre docker examples include chroot, Free BSD Jails, Virtuozzo OpenVZ, Linux Vserver and LXC.
Infrastructure as Code allows you to use software practices in infra!
Imperative = when you say how to get what you want. This can be C programming.
Declarative = when you say what you want. This is terraform.
VPC = Virtual Private Cloud. An isolated unit of infrastructure within a datacenter. VPC needs a subnet + region + firewall rule to setup.
netcat can listen and send basic data on ports with TCP or UDP. Socat can listen and also output to a particular file/port.
Software Defined Network (SDN) - Networks used to be simple, but new control requirements led to greater complexity.
4D = Management Plane, Control Plane, Data Plan truned into Decision, Dissmention, Discovery and Data Plane. This would eventually become OpenFlow. Key Innovation of Openflow is to have a Match action table that gets translated. It unifies a router (match on destination IP prefix + forward out on a link), Switch (match destination MAC address, action forward or flood), Firewall, and NAT. The problem was there were only certain things that could be matched on.
This issue with only being able to match on certain things was fixed with the Rewritable Match Table. (OpenFlow 2.0) This is called p4. It can parse any protocol. https://p4.org/
Most VM's don't actually communicate with each other. Most pairs that do even have pretty low traffic rates between them. Google uses "Hoverboards" that have all the routes within them that can be queried if you need a route that is not already in a specific dataplane.
Zone in GCP = How google provides fault isolation within a region among shared resources.
Google uses Andromeda to virtulize networks which can involve setting particular rules on each host to simulate some network components.
Azure's network is called AccelNet which focuses on a SmartNIC. Pretty similar to Google Andromeda. Azure optimizes further by adding a rule lookup from their equivalent of a hoverboard into a hash table for any subsequent packets. They also use a NIC card to store some of these rules for networking. They use an FPGA to make this faster as well. First packet in each flow is processed in a virtual swithc in software and subsequent ones could be on the FPGA using its cache.
Tail latency = seeing outliers for certain percentages.
Wide Area Network = WAN. Can connect two different LAN's (VPC's) over the internet to make a singular network.
Microsoft's sofware driven WAN authors propose that it is a lack of coordination and local greedy resource allocation that make legacy routing ineffiecient for their causes.
Private IPs = 10/8, 172.16/12, 192.168/16
VPC Peering docs Here are docs from Hashicorp that give terraform instructions on peering. Allows VM's in two different VPCs to communicate using private IP Addresses.
Can't have subnets with overlapping address space when peering two VPC's. Traffic through the Peers still traverses the firewall.
Cloud NAT = a regular NAT, sometimes can do the address translation at the local VM level with a virtual switch on each host.
Google Espresso is Google's approach to global internet peering for their datacenters. It uses BGP, but allows specific BGP "speakers" so that not everything has to peer and can use software defined configs to make the process more malleable.
A load balancer serves as the point of entry for a service and directs traffic to one of N servers that can handle the request. Used for both scaling and resilience.
Application load balancer L7 vs Network Load Balancer L4
layer 4 (IP) network load balancing can passthrough traffic, mainly for balancing the load and may ensure affinity. Can be implemented in the virtual switch on each host (like via andromeda if it's inside the network for GCP)
Layer 7 (HTTP) application load balancing also called a web proxy. Terminates the TCP connection and can direct specific url's to specific servers.
External load balancing = public IP address, Internal load balancing = between applications. Internal network can use Andromeda so that there isn't an internal proxy.
standard tier = exits the network closest to the souce i.e. traverses internet.
premium tier = exits the network closest to the end user using google's network. External IP addresses are advertized at many different endpoints to get closer to the user. It can advertize its point via BGP via multiple nodes. This is called Anycast.
GCP requires a dedicated subnet for its application load balancer. This is for running Envoy proxies!
Terraform example of a load balancer between instances in GCP.
Autoscaling = monitoring usage and creating/killing instances as needed. Can be basd off of many different thresholds and potentially predictive (time of day, past behavior, ML algo etc.)
Configuration information is given via a "metadata server". IT is local to the current network only and is always at 169.254.169.254. It provides DNS, NTP and the metadata service.
Google Maglev = Google's Network load balancer. L4 load blancing and is designed to be fast and passthrough traffic.
Consitent hashing, oringinally designed by Akamai for content distribution. server = hash(x) modulo k. Where x is the file and k is the server count. Consistent hashing fixes that situation by assigning to a server that is closest on a clockwise bucket. This allows a change in the load balancer to persist sending to the same backend server even if it changes. "Ensures that in the face of failure, existing TCP connections are directed to the same server"
Content distribution networks = CDN. Has a lot of servers at the edge of the internet (all over the world paired with various AS's in the Internet). This can be used to cache data that a server is trying to give so subsequent accesses are faster. This alleviates load on the origin server.
Akamai is a company that was early on in the CDN space. The use DNS load balancing to share the content to different servers around the world.
IPSec underpins VPN's. This can be used to encrypt data to an office. It creates an ecrypted tunnel between two endpoints. Can use ISAKMP/IKE to share parameters + keys + auth each side. Then traffic can be exchanged. This could be an individual user or a whole site. High availability uses two tunnels.
Colocation allows customers to put their physical equipment in a building. Each company can get their own secure cage. Network provided equipment is put the "Meet me" room. This room can get direct connections to a particular network provider. Some facilities can have a physical google switch that connects to GCP (or other provider). It may also be another provider that has access to a direct connection to a cloud provider. This will show up as a VLAN in a cloud provider (at least in GCP)
Could use a cloud provider that has physical connections to multiple clouds. Cross Cloud interconnect is essentially a direct connect via an intermediary cloud router. Google can provide this!
Cross cloud interconnection providers mentioned were (Google, Megaport, Console connect, Equinix)