ECEA 5305 - ECEA 5305 Linux System Programming and Introduction to Buildroot
Link to course description
Followed on by ECEA 5306
Books for the Class: Mastering embedded linux programming
Linux has special files
- Everything is a file in Linux (I think everyone hears that at some point). I learned that there are
explicitly "special files"
which include character + block devices + named pipes. character devices include keyboard/graphics drivers
where there is not an
explicit byte boundary that must be followed. Block devices are something like storage where you have to
mount it at the correct
byte boundary. Named pipes are exactly what they sound like; a file version of "|" that can be used to pass
information between
different processes/threads. It follows FIFO and can be created using
mkfifo nameofpipe
. See
https://opensource.com/article/18/8/introduction-pipes-linux
Cool linux commands I didn't know
-
gcc -print-search-dirs
I always forget some of these default paths that gcc searches
I also really didn't know the direcotires that ld searched for by default. They are:
/lib
/lib64
/usr/lib
/usr/lib64
content of LD_LIBRARY_PATH
I've previously just been setting the LD_LIBRARY_PATH env variable like a chump, I should use the /usr/lib
directory more!
Errno
-
I have never really used errno before. I've used POSIX return values before, but I didn't realize how to
explicitly do this in C.
using perror is super nice way to view these messages.
For example if you wanted to try to read a file and print an error if it couldn't be found:
FILE *myfile = fopen(argv[1], "r");
if (myfile == NULL) {
perror("fopen");
return 1;
}
-
This will return
fopen: No such file or directory
all for you!
Logging
Syslog is super nice. I found out on Debian it does not default to using the files in /var/log/syslog. You have
to install
rsyslog to get those files written there. Otherwise you can look at it via journalctl.
Open a log via openlog(logname, option, whoislogging)
. See the linux man page
If logname is NULL, it uses
/var/log/syslog.
Linux File I/O
-
Files are opened via file descriptors. Kernel maintains a per process list of files - file tables
-
Every file in linux has at least 3 file descriptors open, stdin (terminal input device/fd 0), stdiout (terminal display fd 1)
stderror (fd 2 terminal display, but useful so we can redirect it somewhere else)
-
This course is using non "f" open/read/write. The fopen's are buffered I/O, non system calls. Want to interact directly
with the kernal. Fopen uses open
-
fopen returns a file descriptor. open returns the integer file descriptor
-
fopen happens in usespace rather than in the kernel.
-
Multiplexed I/O. Better solution that polling. Watches several file descriptors at once.
select() pselect() poll()
.
poll()
is the preferred mechanism.
strace
Gives the actual kernel calls. Does not require any source code and handy to debug
Use fsync,fdatasync, O_SYNC
-
Can use these especially to avoid unclean shutdown. So we don't have something in the kernel buffer that isn't written to disc
-
Avoid sync() for performance reasons. It will work globally and will move very slow.
Sparse Files
-
Linux is smart enough to make a "hole" to represent the larger file as if it was the larger size, but not
actually allocate the size in storage.
Process Management
- What is a PID? - This is the unique identifier that represents a process in Linux. It is guaranteed to be unique
at any single point in time.
-
The idle process has the pid 0. This is what runs when there are no other runnable processes.
-
The init process has the pid 1. Most likely located at /sbin/init. The linux kernel tries four locations in the following order:
-
/sbin/init
-
/etc/init
-
/bin/init
-
/bin/sh
If all four fail, the kernel will halt the system with a panic
-
Running a program in C? Use
execl()
. This will replace the currently running image.
-
Other exec like functions that use the posix api to start a program. May or may not use env/path/other info
-
The Linux PATH is the default locations where executable files are searched.
-
Avoid relying on PATH, especially on an embedded system where we are in full control.
-
fork()
is similar to execl, but it does not replace the current iamge. Often uses to start daemons. Creates a new PID for child.
Child process will have a copy of all the parent's memories (variables file handles etc.). The Linux Kernel will only change file handles if the child process modifies
them
-
Zombie process is when a child process dies before the parent. Process remains if the parent has not been notified. Parent uses
wait() form the parent to obtain information about terminated children. Gives reason for terminated, and return code.
-
waitpid() and waitid() allow to wait for a specific process.
-
system()
will do fork(), exec() and wait(). It uses the PATH, can have PATH injection conerns. Expands shell input like $HOME.
This may not be what you want to happen.
-
There are muliple user/group IDs associated with a process. The User/Group ID who originally ran the process (Real user/Group ID)
The user who the process is currently running as (effective User ID)
-
Effective User ID can be change with
setuid()
and/or setgid()
. Real user can be obtained with getuid()
and/or getgid()
. Effective user and group can be obtained via geteuid()
and/or getegid()
-
Process Groups is a collection of processes. Session is a collection of Process Groups. Sessions is associated with a controlling terminal.
Also known as tty (TeleTypewriter). tty is a device for terminal I/O
-
A session is created for the login shell on a tty. There is always a foreground process group in a session. This is the one
interrupted with Ctrl->C.
notes/ecea-5305.html
-
0 or more background processes. Can use
&
> in bash to start a process in the background.
-
Daeomons were originally intended to be pronounced as "dee-men"... A process which runs in the background i.e. does not connect
to a controlling terminal. Often started at boot time, run as root/a special user. Name often ends with d (sshd, crond etc.). Often
a child of the init process.
-
Syslog is a good way of sending feedback to the user from a daemon. Logging to a file can be funky in case the file ends up
in a place that is trying to be unmounted.
-
"Modern Unix systems have superior behavior. Instead of a wholesale copy of the parent’s
address space, modern Unix systems such as Linux employ copy-on-write (COW) pages." This basically means that any resource that
a child does not modify its copy of the resource, the kernel will transparently use the a pointer to the parent's instance of the
resource. If it does need it, the kernel will transparently make the copy and the child program will not know that it is accessing
a copy.
File Descriptors/File I/O
Building the Linux Kernel
-
KConfig is a framework that can be used to select what will be built into the kernel. It is a text file that ends up
as .config
Boot artifacts
-
Get a vmlinux output file. This is an ELF binary that may or may not include debug symbols.
-
Binaries in arch/$ARCH/
Sync/Parallel
- Race conditions are diferent program behavior depending on which thread gets there first
- Can be hardware, kerel resource, memory (data race is the most common)
- Critical region is the region of code which needs synchronization
- Code is not neccessarily thread safe, even if the hardware does not support parallelism.
Concurrency could still be an issue. Could load a value into a register before it's stored again properly.
- An atomic operation is indivisible, unable to be interlaved with another operation, appears instantaneous
- To make it thread safe, need to make sequences atomic.
- Can use mutex for this!
pthread
is the Posix library for this
- Deadlock is when two threads are waiting for each other to finish
- Could also happen when a thread is blocked o na mutex it already holds
- How to Avoid (suggestions from the book), lock data not code, make sure multiple data locks are actually required
- pthread_create is a way to create these types of thread
- Can terminate from the start_thread routing
- Threads will consume system resources until joined, can cause a memory leak
pthread_mutex_init, pthread_mutex_lock, pthread_mutex_lock
- Scoped locks are a C++ allocatoin destructor/constructor (RAII), acquire mutex on create, release mutex when out of scope
- C++ will unlock mutexes for you when it falls out of scope
BuildRoot/Yocto
-
Build sytem vs distribution. Distribution might not match your use case.
Something like an embedded vs desktop/user env. Upgrades and installs might not
be interactive if you're on an embedded system
-
It's a different goal!
-
Reducing the image size of a distribution may be challenging. Likely needs customization
to generate a production image.
-
Binary compatibility on different upgrade paths may be challenging.
-
Steps performed by a build system
-
Download source for common packages from upstream
-
Apply patches for cross compilatoin, arch dependent bugs etc
-
Build Components
-
Assemple rootfs in staging area
-
Create image files
-
Build system makes it easy to add your own packages
-
Select system profiles (e.g. selecting with/without graphics)
-
Track open source licenses used, this can be useful to help with open source compliance
when distributing source
-
Buildroot started in 2001, focused on simplicity. Collection of Linux host/target packages with build
instructions.
-
Licensed GPLv2, expected to share changes to buildroot source. Uses Make files + kconfig/menuconfig
same config used in the class for building the linux kernel
-
Based on GNU make and utilities. Build root uses "packages" to build and install, by default located in a package directory.
Can use your own tree (br2-externel) trees. This can be useful if you can't share something upstream/publically.
-
Packages can reference git repos for source code. This can be used to build a custom package.
-
Packages need at least two files: Config.in - KConfig code adding the package to the config menu. "package_name".mk make instructions for the package.
Signals
- Signals are software interrupts for handling async events
- Events outside the system (Ctrl + C)
- Events from the program or kerenel (divide by 0)
- rudimentary Interprocess Communication Method
- Events is async and handler is async
- Signal is raised, stored by kernel, handled by kernel dependent on process
- Ignore (cannot ignore SIGKILL and SIGSTOP)
- Catch and handle, suspend execution of the process (including signal handler!)
- Jump to a previously registered function
- SIGNINT and SIGTERM are common options
- if not configured, perform default action see man page here for defaults
- bunch of examples in linux system programming chapter 10
- SIGABRT (assert()) - terminates and generates core file
- SIGHUP - May be used to reread config files
- SIGINT interupts process
- SIGKILL - unconditionally kills the process
- SIGSEV - Seg fault terminates and generates core file as default action
- SIGTERM - gracefully terminates a process (can catch and teardown)
- SIGSTOP - unconditionally stop a program
- GDB can be used to analyze Corefiles, may need to setup on your distro
- man signal = the docs
- Sigaction is a POSIX alternative to the C Library
- Has better signal handling capability, can retrieve state information, can block new signals during handler, only need to setup sa_handler in sigactino
pause()
waits for a signal (can configure what signal number to wait for)
- kill() sends a signal to a process, can send any kill not just SIGKILL
kill -TERM signalnumber
sigqueue()
can send a payload with a signal (rudimentary) IPC
- Child inherit signal actions of the parent on fork
- Starting a process with
exec()
resets all signals to default actions (other than those ignored by parent)
- Signal handlers - suspend execution of a process, jump to our signal handler function, not a thread swithc or new thread, reuses existing thread
- These signal handlers must be async-signal-safe, must be reentrant and block signals safely
- Look at table of allowed functions for signal safe functions. Generally free/malloc are not signal safe. Nothing can manipulate static data outside the sginal handler
- Should only manipulate stack-allocated data or data provided by the caller
- Minimize global data access, confirma access is async signal safe, save/restore errno (still on the same thread), call absolute minimum set of functions
- Signals are "old antiquted mechanizsm for kernel to user Communication"
- Signal safety problems are easy to introduce and difficult to track down
- Not the ideal IPC method
Timing/Sleep
sleep
> returns the number of seconds not slept, could be interrupted
- usleep - sleeps at least the number of usec's
- nanosleep - returns in the rem value
clock_nanosleep
can be used for more precise sleep sequences.
- Sleep functions are generally not accurate below 1ms on many systems
- Sleep is appropriate for short less than 1 sec infrequent events
- Better to block and allow the kernel to help you out
- People often use sleeps to band-aid issues
- Consider using timers instead at least
- Timers give SIGALARM
- Alarm concerns - signl reentrancy limitations. sleep/usleep/setitimer can send SIGALRM. Can be sent outside the process
- Interval Timers also use SIGALRM, but they rearm themselves
- POSIX timers use threads instead of signals with SIGEV_THREAD. Can be more accurate (or can use absolute time)
- Allows you to use things that are not signalsafe, but are thread safe
Sockets
- Sockets are one of several forms of IPC. They communicate across different systems over TCP/IP
- Also known as BSD/Berkeley Sockets
- More versatile than signals (no OS restriction)
- TCP=Transmission Control Protocol - Connection oriented rotocl, connection is established and maintained while prorams are exchanging messages
- Accepts packets, manages flow control, will retransmit dropped packages
- Handles acknowledgment of packets
- IP Addresses packet and suppports routing between sender and receiver. There is IPv4 and IPv6
- Two types of Sockets
- SOCK_STREAM (stream sockets) reliable two way connected TCP streams
- Messages are delivered in order, retried as necessary
- SOCK_DGRAM - Datagram Sockets
- Connectionless sockets - use UDP instead of TCP
- Use
socket()
POSIX function to obtain a socket file descriptor
- domain - PF_INET or PF_INET6
- type - SOCK_STREAM or SOCK_DGRAM
- protocol - use 0 for the proper protocl type
bind()
assigns an address to the socket. sockfd is the fd for the socket