ECEA 5371 Linux Networking

Linux Data Plane

Routers have to handle many levels like what we say in the previous course! Throughout the linux kernal there are a lot of hook points to get into a specific layer.
iproute2 inerfaces, bridges, forwarding, iptables classification, NAT, filtering, tc queueing discipline, ipvsadm load balancing
Netlink tells all utilities how to process traffic. This moves between kernal and userspace. It's based off of Berkeley Sockets. Socket programming! Each application is a client. Data plane has functions with tables organized in a pipeline that can be interfaced with at a user space level.
tshark = command line version of wireshark. You can use this to capture packets on a specific interface. This is good to test whether a specific packet was received or not.
Types of devices: physical interfaces (ethernet), attach to another device (vlan), connect together multiple devices (bridge, bond), tunnel traffic (vxlan, geneve), virtual devices (veth)
iproute2 - a collection of utilities for controlling tcp/ip networking and traffic control in linux
net-tools - collection of base networking tools for linux. ifconfig, arp, route.
Using iproute2 in this course.
ip link = network device config for linux. Key commands add, show, set, delete
ip link add, has a bunch of different options depending on what you're adding (eth, tunnel etc.)! Lots of different options. These include setting up a gre tunnel an Ethernet device, vxlan and a bridge
A bridge is a device that effectily implement a learning switch. We will create the bridge device then make devices slaves to the bridge. Learning switch = floods all outputs till it gets a correct response back.
ip link add name mybridge type bridge ip link set mybridge up ip link set eth1 master mybridge ip link set eth2 master mybridge ip link set eth3 master mybridge
Vlan is useful for enabling isolation in a shared l2 network. Allows only seeing certain devices on a switch
ip link add link eth0 name eth0.2 type vlan id 2
Tunnel encapsulates traffic for transmission over some network. (geneve, vxlan) ip link add name gen0 type geneve id 55 remote 1.2.3.4
Disaggregation provides more choice/extensibility
Netlink is used to communicate between Linux user space processes and the Linux kernel i.e. the Data Plane.
Containerlab is a tool to use docker containers to create networking experiments.
pcap is a file format for storing a set of packets
scapy is useful for crafting a single packet with a given source and destination MAC address.

IP Layer

IP (Internet Protocol provides communication between multiple, heterogeneous networks)
Final /at the end of an IPv4 Addressing i.e. 10.10.0.0./24 signifies the number of upper order bits taht uniquely specify the group of IP addresses in that subnet.
Linux as end host (either as a client or host) needs an ip address and needs to be able to know where to send traffic.
Firewall/load balancer. Sits on the edge of a network. One NIC for external and one for internal. Linux is awesome because you can use it to cover the full stack of IP.
Routing = compute paths by coordianting with neighbors
Forwarding = direct traffic to a destination
ip route is the utility for linux forwarding table management. ip route add 11.11.0.0/16 dev eth3
via is an option within the ip route add. Via indicates that there is a hop in between at the IP layer. The next hop won't be the destination. For example in p route add 11.11.0.0/16 via 192.168.2.1 dev eth3 vs ip route add 11.11.0.0/16 dev eth3. Linux will ARP for the destination in the packet (without via) or ARP for the next hop (with via)
ARP = address resolution protocol. This is "I'm looking for an ip address" what's your mac address? This can be used to connect to the final IP! This ends up creating the ARP/Neighbor/Mac Address Table.
ip neigh show can be used to look at the ARP table. Example: jack@grapefruit:~$ ip neigh show 192.168.0.214 dev wlp4s0 lladdr e4:5f:01:2e:37:bd REACHABLE 192.168.0.167 dev wlp4s0 lladdr 00:11:32:af:99:21 STALE 192.168.0.1 dev wlp4s0 lladdr e4:38:83:3c:62:8a REACHABLE
You can add manually as well (but this is pretty rare)
Forwarding = data plane. Directs a packet ot an output port/link. Uses a forwarding table
Routing: Control plane. Computes paths by coordinating with neighbors. Creates the forwarding table.
BGP is used to coordinate various inter connected networks (Autonomous Systems)
Routers will "peer" with a neighbor. Establihes a TCP Connection, establishes some properites about each other, then state machine leads to "established" states once peered. Now they can exchance routes.
Lots of routing software availble. Quagga, Bird, FRR, GoBGP.
Bird is provided a config file and will do routing. You can configure protocol blocks related to Linux. They can often do more than just BGP.
Bird routing daemon needs the Autonomous System (AS) number of a neighbor and the IP address of an admin to configure a BGP peering session. See following block.

    protocol bgp uplink1 {
        description "My BGP uplink";
        local 198.51.100.1 as 65000;
        neighbor 192.51.100.10 as 64496;
        hold time 90;
        password "secret";
        # You can then add a filter to ignore certain things either as an import/export
        ipv4 {
            import filter rt_import;
            export where source ~ [ RTS_STATIC, RTS_BGP ];
        }
    }

MRT = format that routing tables are stored in from RIPE.
birdc show route all will show the current rounting table in the bird routing daemon.
ExaBGP is a BGP agent meant for testing. Can add static routes with actual AS Paths and also programtically add and remove routes with exaBGP. It allows a menas to emulate a large set of network nodes as a single process (so you don't have to build a full internet topology with real Internet routers)
Seed internet Emulator = A wait of emulating lots of different parts of the internet (Autonomous Systems, Internet Exchange etc.) in a python program and create a "real" internet.
Filtering can be used to block/allow traffic acording to some policy. It could be on the edge of a network or at the host itself.
Match/Classification compare inputs to a rule set in the table to see if there is a match. There is a corresponding action in the table for a particular match.
Good Practice = Default Drop. Then have to whitelist everything else.
Address Translation can manipulate the address (or any field i.e. port) in a packet.
iptables is part of the netfilter framework that can configure the Linux kernel's filtering framework. The default table is filter but it can be configured with -t
Tables = IP Packet filter rules in the linux kernel. iptables is a tool to set those rules!
iptables -t nat -A PREROUTING -d 111.111.0.1 -p tcp --dport 1000 -j DNAT --to-destination 10.10.0.2 This example command sets ingress traffic with destination 111.111.0.1 to new destination 10.10.0.2
Each table contains chains including built in and maybe user-defined.
Each chain is a list of rules which can match a set of packets. Rules match on a criteria and take some action. -p protocol, -s source, -d destination, -i in-interface, -o out-interface There are also extensions for specific protocols. --destination port for TCP and --tcp-flags if TCP If -m [module] allows for loading more extensions. -m connlimit allows for filtering on connection limit.
Actions take the form of -j target. Accept, Drop (Return Queue). Also some extensions in actions. -j REJECT --reject-with or -j LOG --set loglevel
There are four defined tables. Filter = default table. nat = Used when a packet that creates a new connection is encountered. Mangle = used for specialized packet alteration. Raw = used for configuring exemptions from connection tracking.
Cool diagram on Kernel networking regaridng how a packet traverses the kernel. here it is
Chain is a set of rules that are evaluated sequentially. Rules can be terminating or non-terminating. It steps between each rule until it reaches an ACCEPT or DROP. There are already some predined chains in the linux kernel. Filter is the default table (if you didn't specify -t)
ipvs sets up transport layer load balancing inside linux kernel
ipvsadm allows for configuring ipvs
modprobe ipvs to load ipvs.
ipvsadm -A -t 207.175.44.110:80 -s rr adds a TCP service at IP address 207.175.44.110 port 80 and will select a backend server in a round robin manner.
Quality of Service (QoS) could care about Bandwidth, Latency, Jitter, Loss. It could care about a combination of all of them, but it depends on the service to define it!
Classification = Inspecting packets to identify what class of traffic they belong to. Could classify based on packet headers (layer 3 and 4) i.e. port 80, or a specific IP Address range i.e. 1.2.3.0/24. Could also classify based on deep packet inspection. Expensive but more detailed. Could also classify based on fingerprinting. Protocols/applications/OS all have fingerprints that could be identified with stats.
Shaping = Ensuring a given class of traffic conforms to some shape (properties). Could care about traffic rate and burst rate. Traffic rate is the average allowed, but burst allows for small bumps in traffic. Burst = bucket size, traffic rate = size of hole. Called policing on the ingress side.
Scheduling = Given a queue that is starting to fill up, determine what/when to drop. Given multiple queues, determine which packet to transmit next. Many different ways to balance these queues including Round Robin.
tc = linux traffic control utility. Used for QoS setup and more. Key constructs: qdisc, class, filter.
In traffic shaping two key properties are traffic rate and allowed burst, In the token bucket algorithm, rate at which which tokens are added to the bucket = traffic rate. Allowed burst = depth of the bucket.
qdisc is the main object for shaping/scheduling. Every interface must have ingress and egress qdisc. Packets go into a qdisc and then goes out the other side. Simple example pfifo_fast regular FIFO and default.
tc qdisc [add | delete | replace] dev DEV \ [parent qdisc=id | root] [handle qdisc-id] qdisc [qdisc specific parameters]
Example qdisc command: tc qdisc add dev eth0 root handle 1: tbf rate 1mbit burst 32kbit latency 400ms uses token bucket filter qdisc as the queuing discipline.
qdisc can be hierarchical and can specify a handle (like 1, use this to refer to a qdisc)
You may want to filter traffic to keep traffic from reaching an end host application and overwhelming it.
When doing address translation for portforwarding, in the ingress direction you need to modify the destination IP Address.

Virtual Networking with Linux

virtualization allows for a more efficient use of resources and a simpler deployment.
namespace = what resources and naming of those resources a process sees (file descriptors, IP addresses)
cgroup (control group) = gropus processes and allocates resources (CPU, Memory) that the kernal enforces
Linux kernel maintains data structrues on a per process basis (file system, process IDs, etc.)
nsproxy is a struct in that task/process struct that has a struct for namespaces and one in particular called net. This has all the variables for networking! Things like netns_ipv4 for ipv4 rules and config. Things like FIB (forward information base = forwarding table). Each namespace has its own set of tables.
ip netns is the Linux utility for modifying network namespaces.Processes inherit their namespaces from their parent (up to the root process).
Repo for this section
ip netns add NAME = create a network namespace. ip netns ls to list. Need to run as sudo.
veth = virtual ethernet device. Always created in interconnected pairs. Packets transmitted on one are automatically shown on the corresponding other one.
ip link add _p1-name_ type veth peer name _p2-name_ to add a veth.
ethtool can be used to find the peer of a veth network interface combined with ip link Could also generate a specific packet with scapy and inspect the other sides with tshark
ip link set _p2-name_ netns _p2-namespace_

Can connect two namespaces by attaching the other end of a veth pair to another namespace.

ip netns add nsDemo1
ip netns add nsDemo2
ip link add vethY type veth peer name vethZ
ip link set vethZ netns nsDemo1
ip link set vethY netns nsDemo2

To execute commands with netns ip netns exec _cmd_ Can then use regular ip commands!
Could move a veth device out of namespace by setting a new namespace! If you moved it to 1, it would be in the root namespace!
Can shorthand ip netns exec via ip -n
Docker doesn't create namespaces in the same place as running from root does. This prevents ip netns ls from resolving a new network created inside a container.
docker0 is the default bridge that each veth device in the root namespace is connected to in Docker.

    sudo touch /var/run/netns/$container_name

    pid=$(sudo docker inspect -f '{{.State.Pid}}' $container_name)

    echo $pid

    sudo mount -o bind /proc/$pid/ns/net /var/run/netns/$container_name

Basically mount it's net from the docker net to where netns looks up namespaces.
The more reasonable thing to do would be to use the docker network commands rather than running ip link commands inside of a container to create a network. docker network create and docker network connect _network_ _container_

Kubernetes Networking with Linux

K8s handles where to run (scheduler to pick nodes), how to communicating (networking as containers change), scaling the service, and handling failure (restarts)
Super useful way of testing k8s locally
Each container in a pod does not neccessarily get a unique IP Address.
CNI = Container Network Interface. Configuration and API of a network plugin follows the CNI spec. One popular network plugin is called Flannel.
kube-proxy Makes sure client can connect to the services you define, load balanced when needed. Runs on every node, but it's not in the path of traffic (not an actual proxy i.e. does not actually process network traffic). It just interfaces to iptables/ipvs.
Can also do load balancing with iptables using the module statistic. Here's an article on this.
Ingress controller is a load balancer such as nginx, traefik, haproxy etc. Need to deploy an ingress controller first. Ingress nginx already has a kubectl config (look it up!)
Then you can define a path to go to a certain service.
To create a k8s network plugin you need an executable that implements the CNI spec and a configuration file that describes the network plugin.
Can get k8s logs via kubectl describe pod podname, kubectl describe node nodename, cat /path/to/network-plugin-log, and journalctl -u kubelet