ECEA 5305 - ECEA 5305 Linux System Programming and Introduction to Buildroot


Link to course description
Followed on by ECEA 5306

Books for the Class: Mastering embedded linux programming


Linux has special files

  1. Everything is a file in Linux (I think everyone hears that at some point). I learned that there are explicitly "special files" which include character + block devices + named pipes. character devices include keyboard/graphics drivers where there is not an explicit byte boundary that must be followed. Block devices are something like storage where you have to mount it at the correct byte boundary. Named pipes are exactly what they sound like; a file version of "|" that can be used to pass information between different processes/threads. It follows FIFO and can be created using mkfifo nameofpipe. See https://opensource.com/article/18/8/introduction-pipes-linux

Cool linux commands I didn't know

  1. gcc -print-search-dirs I always forget some of these default paths that gcc searches I also really didn't know the direcotires that ld searched for by default. They are: /lib /lib64 /usr/lib /usr/lib64 content of LD_LIBRARY_PATH I've previously just been setting the LD_LIBRARY_PATH env variable like a chump, I should use the /usr/lib directory more!

Errno

  1. I have never really used errno before. I've used POSIX return values before, but I didn't realize how to explicitly do this in C. using perror is super nice way to view these messages.
  2. For example if you wanted to try to read a file and print an error if it couldn't be found:

                    
                        FILE *myfile = fopen(argv[1], "r");
                        if (myfile == NULL) {
                            perror("fopen");
                            return 1;
                        }
                    
                

  3. This will return fopen: No such file or directory all for you!

Logging

Syslog is super nice. I found out on Debian it does not default to using the files in /var/log/syslog. You have to install rsyslog to get those files written there. Otherwise you can look at it via journalctl.

Open a log via openlog(logname, option, whoislogging). See the linux man page If logname is NULL, it uses /var/log/syslog.

Linux File I/O

  1. Files are opened via file descriptors. Kernel maintains a per process list of files - file tables

  2. Every file in linux has at least 3 file descriptors open, stdin (terminal input device/fd 0), stdiout (terminal display fd 1) stderror (fd 2 terminal display, but useful so we can redirect it somewhere else)

  3. This course is using non "f" open/read/write. The fopen's are buffered I/O, non system calls. Want to interact directly with the kernal. Fopen uses open
  4. fopen returns a file descriptor. open returns the integer file descriptor
  5. fopen happens in usespace rather than in the kernel.
  6. Multiplexed I/O. Better solution that polling. Watches several file descriptors at once. select() pselect() poll(). poll() is the preferred mechanism.

strace

Gives the actual kernel calls. Does not require any source code and handy to debug

Use fsync,fdatasync, O_SYNC

  1. Can use these especially to avoid unclean shutdown. So we don't have something in the kernel buffer that isn't written to disc
  2. Avoid sync() for performance reasons. It will work globally and will move very slow.

Sparse Files

  1. Linux is smart enough to make a "hole" to represent the larger file as if it was the larger size, but not actually allocate the size in storage.

Process Management

  1. What is a PID? - This is the unique identifier that represents a process in Linux. It is guaranteed to be unique at any single point in time.
  2. The idle process has the pid 0. This is what runs when there are no other runnable processes.
  3. The init process has the pid 1. Most likely located at /sbin/init. The linux kernel tries four locations in the following order:
    1. /sbin/init
    2. /etc/init
    3. /bin/init
    4. /bin/sh
    If all four fail, the kernel will halt the system with a panic
  4. Running a program in C? Use execl(). This will replace the currently running image.
  5. Other exec like functions that use the posix api to start a program. May or may not use env/path/other info
  6. The Linux PATH is the default locations where executable files are searched.
  7. Avoid relying on PATH, especially on an embedded system where we are in full control.
  8. fork() is similar to execl, but it does not replace the current iamge. Often uses to start daemons. Creates a new PID for child. Child process will have a copy of all the parent's memories (variables file handles etc.). The Linux Kernel will only change file handles if the child process modifies them
  9. Zombie process is when a child process dies before the parent. Process remains if the parent has not been notified. Parent uses wait() form the parent to obtain information about terminated children. Gives reason for terminated, and return code.
  10. waitpid() and waitid() allow to wait for a specific process.
  11. system() will do fork(), exec() and wait(). It uses the PATH, can have PATH injection conerns. Expands shell input like $HOME. This may not be what you want to happen.
  12. There are muliple user/group IDs associated with a process. The User/Group ID who originally ran the process (Real user/Group ID) The user who the process is currently running as (effective User ID)
  13. Effective User ID can be change with setuid() and/or setgid(). Real user can be obtained with getuid() and/or getgid(). Effective user and group can be obtained via geteuid() and/or getegid()
  14. Process Groups is a collection of processes. Session is a collection of Process Groups. Sessions is associated with a controlling terminal. Also known as tty (TeleTypewriter). tty is a device for terminal I/O
  15. A session is created for the login shell on a tty. There is always a foreground process group in a session. This is the one interrupted with Ctrl->C.
  16. notes/ecea-5305.html
  17. 0 or more background processes. Can use &> in bash to start a process in the background.
  18. Daeomons were originally intended to be pronounced as "dee-men"... A process which runs in the background i.e. does not connect to a controlling terminal. Often started at boot time, run as root/a special user. Name often ends with d (sshd, crond etc.). Often a child of the init process.
  19. Syslog is a good way of sending feedback to the user from a daemon. Logging to a file can be funky in case the file ends up in a place that is trying to be unmounted.
  20. "Modern Unix systems have superior behavior. Instead of a wholesale copy of the parent’s address space, modern Unix systems such as Linux employ copy-on-write (COW) pages." This basically means that any resource that a child does not modify its copy of the resource, the kernel will transparently use the a pointer to the parent's instance of the resource. If it does need it, the kernel will transparently make the copy and the child program will not know that it is accessing a copy.

File Descriptors/File I/O

Building the Linux Kernel

  1. KConfig is a framework that can be used to select what will be built into the kernel. It is a text file that ends up as .config
  2. Boot artifacts

  3. Get a vmlinux output file. This is an ELF binary that may or may not include debug symbols.
  4. Binaries in arch/$ARCH/

Sync/Parallel

  1. Race conditions are diferent program behavior depending on which thread gets there first
  2. Can be hardware, kerel resource, memory (data race is the most common)
  3. Critical region is the region of code which needs synchronization
  4. Code is not neccessarily thread safe, even if the hardware does not support parallelism. Concurrency could still be an issue. Could load a value into a register before it's stored again properly.
  5. An atomic operation is indivisible, unable to be interlaved with another operation, appears instantaneous
  6. To make it thread safe, need to make sequences atomic.
  7. Can use mutex for this! pthread is the Posix library for this
  8. Deadlock is when two threads are waiting for each other to finish
  9. Could also happen when a thread is blocked o na mutex it already holds
  10. How to Avoid (suggestions from the book), lock data not code, make sure multiple data locks are actually required
  11. pthread_create is a way to create these types of thread
  12. Can terminate from the start_thread routing
  13. Threads will consume system resources until joined, can cause a memory leak
  14. pthread_mutex_init, pthread_mutex_lock, pthread_mutex_lock
  15. Scoped locks are a C++ allocatoin destructor/constructor (RAII), acquire mutex on create, release mutex when out of scope
  16. C++ will unlock mutexes for you when it falls out of scope

BuildRoot/Yocto

  1. Build sytem vs distribution. Distribution might not match your use case. Something like an embedded vs desktop/user env. Upgrades and installs might not be interactive if you're on an embedded system
  2. It's a different goal!
  3. Reducing the image size of a distribution may be challenging. Likely needs customization to generate a production image.
  4. Binary compatibility on different upgrade paths may be challenging.
  5. Steps performed by a build system
    1. Download source for common packages from upstream
    2. Apply patches for cross compilatoin, arch dependent bugs etc
    3. Build Components
    4. Assemple rootfs in staging area
    5. Create image files
  6. Build system makes it easy to add your own packages
  7. Select system profiles (e.g. selecting with/without graphics)
  8. Track open source licenses used, this can be useful to help with open source compliance when distributing source
  9. Buildroot started in 2001, focused on simplicity. Collection of Linux host/target packages with build instructions.
  10. Licensed GPLv2, expected to share changes to buildroot source. Uses Make files + kconfig/menuconfig same config used in the class for building the linux kernel
  11. Based on GNU make and utilities. Build root uses "packages" to build and install, by default located in a package directory. Can use your own tree (br2-externel) trees. This can be useful if you can't share something upstream/publically.
  12. Packages can reference git repos for source code. This can be used to build a custom package.
  13. Packages need at least two files: Config.in - KConfig code adding the package to the config menu. "package_name".mk make instructions for the package.

Signals

  1. Signals are software interrupts for handling async events
  2. Events outside the system (Ctrl + C)
  3. Events from the program or kerenel (divide by 0)
  4. rudimentary Interprocess Communication Method
  5. Events is async and handler is async
  6. Signal is raised, stored by kernel, handled by kernel dependent on process
  7. Ignore (cannot ignore SIGKILL and SIGSTOP)
  8. Catch and handle, suspend execution of the process (including signal handler!)
  9. Jump to a previously registered function
  10. SIGNINT and SIGTERM are common options
  11. if not configured, perform default action see man page here for defaults
  12. bunch of examples in linux system programming chapter 10
  13. SIGABRT (assert()) - terminates and generates core file
  14. SIGHUP - May be used to reread config files
  15. SIGINT interupts process
  16. SIGKILL - unconditionally kills the process
  17. SIGSEV - Seg fault terminates and generates core file as default action
  18. SIGTERM - gracefully terminates a process (can catch and teardown)
  19. SIGSTOP - unconditionally stop a program
  20. GDB can be used to analyze Corefiles, may need to setup on your distro
  21. man signal = the docs
  22. Sigaction is a POSIX alternative to the C Library
  23. Has better signal handling capability, can retrieve state information, can block new signals during handler, only need to setup sa_handler in sigactino
  24. pause() waits for a signal (can configure what signal number to wait for)
  25. kill() sends a signal to a process, can send any kill not just SIGKILL
  26. kill -TERM signalnumber
  27. sigqueue() can send a payload with a signal (rudimentary) IPC
  28. Child inherit signal actions of the parent on fork
  29. Starting a process with exec() resets all signals to default actions (other than those ignored by parent)
  30. Signal handlers - suspend execution of a process, jump to our signal handler function, not a thread swithc or new thread, reuses existing thread
  31. These signal handlers must be async-signal-safe, must be reentrant and block signals safely
  32. Look at table of allowed functions for signal safe functions. Generally free/malloc are not signal safe. Nothing can manipulate static data outside the sginal handler
  33. Should only manipulate stack-allocated data or data provided by the caller
  34. Minimize global data access, confirma access is async signal safe, save/restore errno (still on the same thread), call absolute minimum set of functions
  35. Signals are "old antiquted mechanizsm for kernel to user Communication"
  36. Signal safety problems are easy to introduce and difficult to track down
  37. Not the ideal IPC method

Timing/Sleep

  1. sleep> returns the number of seconds not slept, could be interrupted
  2. usleep - sleeps at least the number of usec's
  3. nanosleep - returns in the rem value
  4. clock_nanosleep can be used for more precise sleep sequences.
  5. Sleep functions are generally not accurate below 1ms on many systems
  6. Sleep is appropriate for short less than 1 sec infrequent events
  7. Better to block and allow the kernel to help you out
  8. People often use sleeps to band-aid issues
  9. Consider using timers instead at least
  10. Timers give SIGALARM
  11. Alarm concerns - signl reentrancy limitations. sleep/usleep/setitimer can send SIGALRM. Can be sent outside the process
  12. Interval Timers also use SIGALRM, but they rearm themselves
  13. POSIX timers use threads instead of signals with SIGEV_THREAD. Can be more accurate (or can use absolute time)
  14. Allows you to use things that are not signalsafe, but are thread safe

Sockets

  1. Sockets are one of several forms of IPC. They communicate across different systems over TCP/IP
  2. Also known as BSD/Berkeley Sockets
  3. More versatile than signals (no OS restriction)
  4. TCP=Transmission Control Protocol - Connection oriented rotocl, connection is established and maintained while prorams are exchanging messages
  5. Accepts packets, manages flow control, will retransmit dropped packages
  6. Handles acknowledgment of packets
  7. IP Addresses packet and suppports routing between sender and receiver. There is IPv4 and IPv6
  8. Two types of Sockets
  9. SOCK_STREAM (stream sockets) reliable two way connected TCP streams
  10. Messages are delivered in order, retried as necessary
  11. SOCK_DGRAM - Datagram Sockets
  12. Connectionless sockets - use UDP instead of TCP
  13. Use socket() POSIX function to obtain a socket file descriptor
  14. domain - PF_INET or PF_INET6
  15. type - SOCK_STREAM or SOCK_DGRAM
  16. protocol - use 0 for the proper protocl type
  17. bind() assigns an address to the socket. sockfd is the fd for the socket