ECEA 5306 - ECEA 5306 Linux Kernel Programming and Introduction to Yocto

Follow-on class from ECEA 5305
Course description

Books for the class: Linux Device Drivers 3rd Edition

Operating System Structures: Monolithic (Linux) - Entire content of kernel is running in a protocted area and all user applications in user space
Microkernel (freeRTOS) - Some user applications running in user space, and some operating system abilities are run from user mode
Hybrid Kernel (Windows, MacOSx) - Some combination of the two. Some device drivers and other are in user mode, some are in kernel
Device Driver goal is to make a piece of hardware respond to well defined abstraction inerface (read(), write() ioctl(), etc). Should be able to be compiled seperately and added to kernel dynamically.
Driver should provide mechanism not policy. What capabilities are provided (mechanism), how are they used? (policy). Driver should allow you to set HDD system block in the manner you choose.
Policy free driver characteristics. Support async and sync operation. Ability to open multiple times at once, don't add extra layers to "simplify things" (this ends up as creating policy). The book suggests bundling the driver with a sample program for user space policy suggestions.
Kernel roles: process management (create/destroy process i/o, schedule processes), memory management, file systems (everything is a file and types of filesystems), device control (endpoint of system operations) kernel must have a device driver for every device in the system, networking (collect, identiy ,dispatch network packets, routing)
User space roles - Applications FTP server, graphics session manager, utilities that interact with drivers
Modules: Kernel code added at runtime. Drivers are kernel code that control hardware and may or may not be added at runtime.
insmod - links to the running kernel
rmmod - removes from the running kernel
modprobe - links to a running kernel, also including dependencies
Devices belong to a set of one classes (classes can share common access code and methods).
Character, block, network
Character device class - accessed as a string of bytes. e.g. /dev/console, /dev/ttyS0, may or may not be possible to seek or map memory like you can do with real files. Basics of read and write are the same.
Block device classes - device which can host a filesystem. Transfers are always on block boundaries at the device level, usually 512 bytes or lareger. Linux device drivers allow you to access at less than 1 block siezes, kernel manages split blocks for you e.g. /dev/sda1
Network Interface class - May be a hardware device, may also be a software device like a loopback interface. Handles packets uses a name like eth0. Use case and packet IO doesn't map well to read() and write().
Kernel Filesystems - maps low level blocks on a disk to directories and files. Independent of the data transfer mechanism used to transfer blocks to/from disk. Uses devices and device drivers to perform transfers.
Security checks are ultimately enforced by kernel code. Kernel becomes critical point of exploit, only authorized user can load modules
Drivers typically do not encode security policy. Sys admin controls based on permissions of device associtaed with driver. Global resource changes (interaction with interrupts/firmware update may have security issues)
Security best practices. Don't trust user data without verifying. Don't allow anything the user sends to write any areas of mem it should not. Zero out memory obtained in the kernel before passing outside (avoids information leakage). Length check user space data before copying - avoids buffer overrun.
Kernel is versioned with major.minor, no longer feature based.
Tainted Kernel - changed the content of the kernel while it's running and it's a non GPL module Signal to kernel developers that they don't have the info to debug a bug report
printk is used to print messages to the kernel log. We don't have libc for kernel programming (no stdio.h)
module_init(funcname) identifies a function called when the module starts (from insmod). Similar to an event driven application
module_exit(funcname) identifies a function called when the module is unloaded. In kernel space if you forget to free memory, it stays allocated. No autmoatic cleanup when process exits. Make sure to clean up anything you do in the init functions.
Module vs Application
MODULE_LICENSE macro clarifies that the module bears a free license. Some kernel function calls aren't available iwth propriety licenses.
Segfault kills more than your code (most likely), may bring down the entire system
Kernel stack is small compared to applications. Application default stack ~2MB, kernel stack may be 4k in size
All active functions share this small 4k stack space. Don't declare large autmoatic stack allocated variables, allocate memory instead
Floating point is generally not supported.
Modern processors support enforcing protection against unauthorized access to resources, seperate address spaces/operating levels
User space has a concept of lowest privldege
Kernel space used the highest operating level (supervisor mode). entered with system call or interrupt.
Kernel runs multiple processes, multiple processes may be trying to user your driver. Interrupts run async (and kernel timers) All kernel code should therefore be reentrant, shared data access must be handled correctly. Data structs must keep threads of execution seperate.
Build process for modules is different than user space appliations. Linux source docs for kbuild are here Make file used for the course is here
ls -la /lib/modules contains your currently available kernals. uname -r gives you the currently running version The content of the build directory should have a Makefile in it that corresponds to the Makefile that was used to build the kernel currently running.
Recursive make means that the makefile is read twice. Useful in this instance because the first time KERNELRELEASE isn't set, used M variable to find your makefile and read a second time with KERENELDIR set. It's done this way to allow shared use of the make tool. All the details about how to build a module can be put just within the model.
.ko file is what can be loaded into the kernel via insmod after linking to kernel symbol table to load the module. Similar to the linker, ld used when linking user space programs. Links unresolved symbols to the kernel symbol table. Modifies an in memory copy rather than the module binary on disk.
When loading a module can optionally pass parameters into your module. modprope can also load modules. Modproble handles dependencies, loads depent modules autoamtically, avoids "unresolved sybmols" message failres when modules exist but aren't yet loaded
rmmod can be used to remove modules
lsmod lists currently loaded modules and whether other modules depend on a module. Uses info from /proc/modules and /sys/modules
/sys and /proc - "virtual" filesystem trees used to interact with the kernel. Contents of these directories are populated by the kernel on demand.
/sys and /proc could be loaded on any directory, just by convention that they are there.
Your driver is only allowed to load against the kernel version for which it was compiled. It does this with vermagic. Target kernel, compiler version, processor, config variables MUST match
If it doesn't match, insmod will complain "Error inserting module-name:-1 Invalid module format"
APIs break between kernel revisions. KERNEL_VERSION macro can be used to handle compataibilties.
How do people keep up with these variations? Release under a GPL compatible license and let your users handle it. Could also get it added to the mainline kernel.
Could also distribute in source form with scripts to compile. Dynamic Kernel Module Support (DKMS) is helpful for this for anything that is open source, but not in the kernel.
Build only for a single target kernel config
Kernel symbol table - Your module can export symbols for use by other modules (like the dependencies in modprobe). This is called "module stacking"
EXPORT_SYMBOL and EXPORT_SYMBOL_GPL can be used to make sure liceneses are compatible and only able to be used with GPL2 code
MODULE_LICENSE, MODULE_AUTHOR, MODULE_DESCRIPTION, MODULE_VERSION are some of these neccessary macros
Module init should be static, it's not meant to be used outside of the file. In this init function, register kernel facilties (e.g. deviecs, filesystems, crypto transforms, sysfs, proc etc) and init any data structures. Data structures should be in allocated memory! Typically uses __init so the functionss stack is discarded
Module exit should unregister interfaces in reversed order, and free any memory allocated in init. No auto cleanup like userspace! Often uses __exit prefix so that the code can be discarded if it's never able to be run (like built into kernel)
Make sure to unregister anything you register, even if it fails! Make sure to have individual errors for each register. Typically uses goto which is common in kernel programming.
Module parameters can be definied as args. Define it with module_param(var, type, permissions)>

Intro to Yocto

Yocto is a way to roll your own linux distro , not just your own root filesystem. Includes redistributable packages for each piece of software. Launched in 2011. Includes embedded build tools and an embedded distribution Poky
Build system is licensed MIT. Uses text file configuration and "bitbake" tool based on Python/Bash. Builds most everything from sourcer including all build tools. Helps make binaries reproducible.
source file is the same as . file in bash???
Yocto builds are built by the bitbake utility. Recipes are set of instructions processed by the build engine. Recipes are .bb and .inc text files
.bb typically contains source and version information. .inc contains build and deploy instructions, may contain python or bash fragments
Packages contain binary artifacts from the build. Images are build outputs (binary root filesystem, linux kernel image, uboot or grub bootloader image)
Yocto Layer Model - Collection of directories on the filesystem virually "layered" to make an equivalent build hierarchy. Can be used to override or add to recipes for a applying a patch or configuring something. In the course will use for aesd
bitbake-layers create-layer and bitbake-layers add-layer
Yocto uses a MACHINE variable in build/conf/local.conf to set the target architecture

Singly linked lists in C

queue.h has a singly linked list implemenation
man pages here
Suggested assignment 1 structure
Try to deallocate memory in only one place, main thread is probably the best place.
Linked list is not thread safe! You would have to do some form of locking

Connecting to Userspace

scull - Simple Character Utilitiy for Loading Localities
Acts on memory as if it were a device
Portable across architectures, not hardware dependent
Not a "real" device driver, so no hardware interrupts. Still good for learning
scull0 to scull3 - four devices with memory array global and persistent. Global: data shared with all file descriptors Persistent: Not lost if device is closed and reopened
scullpipe0 to scullpipe3 - FIFO Used for block and non blocking pipes
scullsingle - scull but only allows use for a single process scullpriv - meant for private console only sculluid and scullwuid - opened by one user at a time
/dev is one of the entries into kernel space. Device numbers/mknod map t oa kernel module.
mknod name type major minor
ls -l can show those major and minor device number. Major is type, minor is a specific device.
Common device drivers already have dedicated major numbers (and sometimes minor numbers)
Everyone else allocates for these numbers.
module_init and module_exit start and stop the device drivers. Run register_chrdev(major, name, &fopstable)
You are responsible for unregistering chrdev! Use unregister_chrdev
dev_t type holds both major and minor values associated with macros. Just a unit_32 under the hood. Make via MKDEV(int major, int minor). Extract MAJOR(dev_t dev) MINOR(dev_t dev)
Example usage of scull is here
/proc/devices contains current list of allocated devices and associated drivers. You will see dynamically allocated devices/drivers here.
Can't create device nodes (with mknod) in advance when using dynamic allocating. Parse /proc/devices after loading the module to find major number.

Device Driver File Operations

file_operations, connect driver operations to device numbers. Open, read, write, etc.
file is an object, and functions are methods
also called fops
file structure is NOT related to FILE* for buffered I/O
Represents an open file (specific open/close instance) i.e. Driver open from mknod node /dev/yourdev or file in the filesystem
Called either a file or filp
inode is a representation of a file, not the open file descriptor
file_operations has a large amount of opttions but typically only a few are definied. Read, write, and owner are usually defined. Basically this makes a "class" object of file_operations.
Use .owner etc to initialize anything else with null
Cool website with an intelligent search of the linux kernel
Char Device Registration, can use cdev_alloc(). Gives a chunk of memory and then set it in the ops table my_cdev->ops = &my_fops;
More common way to do this is to use your own structure via cdev_init, then cdev_add
open prototype int (*open) (struct inode *, struct file *);. Open should check for device errors, initialize the device on first open (if needed), update the f_ops pointer to file_operations, allocate and fill/set private data
No c_dev structure in prototype! c_dev is in the union struct of the inode.
Can use container_of(pointer, container_type, container_field) Macro to find where a struct is. So something like struct scull_dev *dev = container_of(inode->i_cdev, struct scull dev, cdev);
release is the reverse of open. Deallocate anything in open() allocated in flip->private_data
If the user space programmer forgets to close the file? Kernel cleans up automatically at process exit. Your driver is guaranteed exactly one release() per open()
Make Driver Code -> Load via insmod or modprobe -> addresses for functions are now locatable (e.g. open/register_chrddev) -> update procfs/devices to include new device -> register in fops table -> read /proc/devices via module_load_script -> use mknod to make the driver -> new entry in root filesystem -> userspace application can open /dev/mydriver -> links to driver source code in kernel
kmalloc(size_t size, int flags) is similar to malloc. Malloc/free are not available becuase glibc doesn't exist. Just using GFP_KERNEL as our flag for now.
Allocate memory! Stack space is very small within the kernel.
fops has ssize_t read(struct file *filep, char __user *buff, size_t count, loff_t *offp. Same prototype for write. buff is a user space pointer, cannot be directly accessed by kernel code. Not guaranteed to be usable within the kernel.
Can use unsigned long copy_to_user(void __user *to, const *from, unsigned long count) and unsigned long copy_from_user(void *to, const void __user *from, unsigned long count); to move between user and kernel space. Similar to memcpy, but deals with any architecture specific issues.

Assignment 7

buildroot rootfs-overlay is a way to add content to root file system/override content from other packages after being built. Content will be placed in the rootsfs at the spceified path (use a relative path!)
Will be compliing kernel modules for out of tree kernel modules. Build root section on this
Good SO Example as well
Also will be making a circular buffer in the userspace, this will move into the kernel space on the next assignment.
Kernel Debugging is harder. Can use gdb and /proc/kcore, but can't use halt, set, breakpoints, modify memory
Not easily traced
Often difficult to reproduce bugs, especially timing related. Often bugs crash the system and destroy evidence. No cleanup methods are guaranteed
To debug - enable "kernel hacking" in your kernel menuconfig
CONFIG_DEBUG_KERNEL, see Linux Device Drivers Chapter 4 for full options
printk() is the most common thing people use
EMERG, ALERT, CRIT, ERR, WARNING, NOTICE, INFO, DEBUG are supported
printk(KERN_DEBUG "I'm printing pointer %p\n", ptr);
dmesg prints kernal output
/proc/sys/kernel/printk can be used to control prints redirected to the console
printk is safe to use anywhere (interrupt safe). Writes output to a circular buffer. A different task writes this buffer elsewhere.
Debug prints should not be in production! A macro like PDEBUG can be useful to turn these on and off. Redine PDEBUG as printk or to nothing if debug mode is off
Use strace to trace system calls and interactions between user space program and the driver.
Can use dynamic debug to turn on and off specific prints from the CLI. Don't need to recompile the driver, don't need special debug builds.
printk_ratelimited can be used to prevent log from being flooded
System faults - due to a bug in a specific driver, may result in a "panic" stopping the kernel.May need to reboot the system
oops message e.g. null pointer dereference of use pointer incorrect value Describes where in code the fault occured
objdump shows you assembly content associated with an object file (in this case a .ko) Will intermix source if you have debug info for kernel module build. If cross-compiling, use the corresponding objdump!

Kernel Drivers and Concurrency

Concurrency bugs "easiest to create and some of the hardest to find"
Made ubiquitous by Symmetric Multiprocessing (SMP Systems), Basically sharing ram between CPU's. AMP (Asymmetric Multiprocessing) has a part of RAM dedicated to each CPU. Ths is harder to write an OS with.
Task switching can make rudimentary gate checking not present! Need to properly lock anything that is accessing dangerous memory/pointers
This may seem unlikely, but a "one in a million event happens ever few seconds" with modern processors. Probablities can also be more prominent on certain hardware platforms.
Dan's law = It will happen when you show your boss
Kernel code is preemptible, driver code can lose the processor at any time!
Device interrupts are completely async
The device could dissappear!
Avoid race conditions by avoiding shared resources (i.e. global variables)
race conditions can still happen because allocating memory and passing the pointer to the kernel can still make a sharing situation
Is hardware or another resource shared beyond a single thread of execution? Is it possible the thread could encounter an inconsitent view of the resource? If either are yes, you must manage resources!
Can use locking or mutual exclusion.
When yo notify the kernel about an object, it must continue to exist and function until no outside referneces exist. References to objects should be tracked, don't make object availble to kerne luntil it is ready.

Critical Sections/Atomic

Atomic operations happen at once, something like a = b (may depend on variable size/arch), not a++ Like in a single instructions, it can't be preempted!
Critical section includes code that can be executed by only 1 thread at a time. Can use Semaphores & mutexes, work with/use sleep. Can also use spinlocks (work in all cases, including when you can't sleep)
It may not be safe to sleep in a critical section, it may be called from an interrupt handler, latency requirmements, could also be holding other critical resources.
Will this critical function sleep, or call a function that sleeps?
kmalloc can sleep! It's in the docs. Checkout the default flag for Kmalloc GFP_KERNEL.
Semaphore are integer value + functions P() and V()
P() when value > 0: value is decremented, process continues
P() when value <= 0: process blocks until value > 0
V() increments value
Initializes value to 1. P() = lock() and V() = unlock()
this is an older way to do it, the kernel now has dedicated mutex operations. Mutex is a special case of the Semaphore
Can use DEFINE_MUTEX(name_of_mutex), typically means it is global
Can also include it in a structure with mutex_init() Make sure the mutex_lock is initalized first!
P() maps to the down() operation in the LDD3 book. down() and down_interruptible() are the same, but the user can intterupt with SIGTERM/SIGKILL. 0 return means the semaphore was obtained This is typically what you want to use.
down_trylock(struct semaphore *sem) never sleeps, - if obtained, 1 if someone else is holding. Thread which has completed "down" successfully "holds" has "taken out" or has "acquired" the semaphore.
void up(struct semaphore *sem) on return of up() your thread no longer hold sthe semaphore
1 call to down should result in exactly 1 call to up, Semaphores must all be released in error paths! Goto is ok in Kernel programming, but not strictly neccessary.
Mutex was added to the Linux Kernel. It is the same thing as a semaphore, just simpler to understand/
mutex_lock -> down
mutex_trylock -> down_trylock
mutex_lock_interruptible -> down_interruptible
mutex_unlock -> up
nested -> used for multiple locks, ordering between
Mutex is used in scull! It must be initalized before cdev_add(), since "no objet must be made available to the kernel before it can function properly"
Use -ERESTARTSYS when retrying the operation is the right thing to do (undo any user visible changes)
Use -EINTR when you can't undo previous changes
Balance each lock with unlock (including error cases)
Only writer threads need exclusive acess. An infinite number of reader threads can access as long as the writer threads are blocked. This can be used for Reader/Writer Semaphores.
Similar functions, but (down_write, up_write etc.), but down_write_trylock/down_read_trylock return 1 instead of 0 on success.
downgrade_write converts a write lock to a read lock
Writers get priority. Use when write access is rarely required and held only briefly.
Completions are similar to sempahores (and you could use semaphores), and used to wait for an activity to complete. Designed for the not availble case.
complete/complete_all support use in interrupt handlers.
if you are thinking about using yield, or msleep, use wait for completion and complete instead! Checkout the API here!
There are timeout and interruptible and interruptible_timeout versino of wait_for_completion!
Also are complete_all to wait for all current and future completes.

Spinlocks

Spinlocks can be used in code which cannot sleep i.e. interrupt handlers
Higher performance than semaphores when properly used
Concept: single bit in integer value + tight loop spin. atomic test and set of bit, wating processors is executing a tight loop
Need to be careful not to use it for too long, otherwise it's just checking the value over and over!
irqsave/irqrestore are different version of spin_lock/spink_unlock. They disable interrupts, have some flags to store whether interrupts werepreviousy enalbed/need to be re-enabled. irqrestore re-enables interrupts.
regular spin_lock/spin_unlock does not disable/renable interrupts. Could cause a deadlock if an interrupt occurs!
Any code must be atomic while holding the spinlock, can't sleep, can't reqlinquish the processor for anything other than interrupts
Difficult to know what functions sleep. Most functions which might allocate memoery might also sleep
Second rule, hold the spinklock for as little time as possible
Locking rules should not be ambigious. Define a lock to control access to specific data. Design from the beginning! Write functions which assume caller has allocated the lock and document assumptions explicitly.
If there are multiple locks, make sure you always acquire in the same order every time. Otherwise you could cause a deadlock where multiple threads are waiting for each other to release a specific lock. They should also be released in the opposite order.
Lock ordering rules are poorly documented (read the source :( )
Avoid using multiple locks whenever possible.
Obtain your driver locks before locks used in other parts of the kernel. Minimzes the chance you block when holding the most popular lock
Hold semaphores before spinlocks, semaphores may sleep, but you can't sleep with a spinlock
Alternatives to locking - Use atomic variables and bit operations. Guaranteed atomic types on all architectures. Bit operations also!
Use a lock free algorithm (circular buffer with exactly 2 threads and atomic count values). read Copy Update, old copies remain valid, cleanup happens when references are released.

more debugging

Nice to use userspace/kernel space debug print macro here
linux trace toolkit, traces events in the kernel
SystemTap
debugfs can make information available to userspace. /proc, intended for process information, but has fallen out of favor
/sysfs - highly organized/restricted content
Can use kmemleak to detect memory leaks in the kernel
strace is super useful to look at the kernel calls. Can call it like: strace -o /tmp/strace.txt ./drivertest.sh drivertest.sh could be whatever script. Use -f to trace any child processes via fork

ioctl

ioctl is a system call on a file descriptor which passed to the driver, similar to read/write, but more unstructured. Identifies a command to be performed and another argument (typically a pointer)
The ioctl call allows you to interact with your driver from userspace, passing a command and optional associated argument pointer
Cannot access ioctl from user space! Need another program to issue the ioctl script. Common interface used for device control, request for a device other than read/write for example: Lock a door, eject media, report information
Userspace prototype #include int ioctl(int fd, unsigned long request, ...);
... traditioanlly represents varargs, typically a single optional argument argp char *argp. It's this because void * wasn't valid C when this was written!
Easiest and most straightfoward choice for device operations. It has fallen out of favor with kernel developers becuase of the unstructured system calls, difficult to audit, not well documented. Alternatives include embedding commands into the data stream and virtual filesystems like sysfs.
Latest kernel is using slightly different function prototype from ldd3. There is a 64 bit vs 32 bit divide in ioctl, because the optional arg is passed in the form of an unsigned long regardless of whether it was given by the user as an integer or pointer. This long may not correspond to pointer size on a 64 bit system, or when compiling for 32 bit. Use compat_ioctl to auto-handle differences in arg size between 64 bit and 32 bit.
ioctl example here, it's using CASE statements with MACROS.
Using MACROS to help generate unique cmd comdes for each device driver using magic numbers. Avoids issuing correct command to the wrong driver. Typically uses a character as a "magic number". SCULL is using 'k'. This is super gross. A list of current ones in linux are here.
IO Macros work like (magic number, sequence command number, type of data transfer) so like this in SCULL.
_IOC_READ (_IOR) - transfers from kerenl to user space
_IOC_WRITE (_IOW) - transfers from user to kernel space
Some ioctls are recognized by the kernel, decoded before passing to your driver. Avoid magic type "T" to avoid these!
write() can be used as an ioctl alternative. Write control sequences the deivces. especially useful for devices that don't transfer data. Basically echo "start" > /dev/yourdevice May or may not end up being more complicated.

Sleeping in the kernel

Process can sleep when blocking for a process. This means the the process is removed from the run queue.
Never sleep when holding a spinlock, seqlock or RCU lock. Attempts to obtain spinlocks consume processor resources while waiting for that event to happen. It's a recipe for deadlock.
Never sleep if you've disabled interrupts. Interrupt latency would suffer.
Avoid sleep/keep durations short while holding a semaphore/mutex consider whether you could introduce a deadock. It's ok to use a semaphore because other threads waiting for the semaphore aren't spinning, they are sleeping. Need to be careful overall, it is easy to introduce a deadlock. It's safe to sleep if any code attempting to obtain the semaphore won't prevent your wake-up condition. You can't really avoid this with other functions.
Make sure you reevalute all states after you come out of the sleep as well. Lock could have changed after waking.
Simple Sleeping wait_event(queue, condition) Also have timeout and interruptible versions. Waitqueue is the queue. wait_queue_head_t my_queue; init_waitqueue_head(&my_queue) The condition is using some macro magic to change the boolean value where it is reevaluated constantly. Could look something like wait_event_interruptible(wq, flag != 0); That condition flag != 0, is constantly reevaluated. Weird looking C Code!
There are also wake_up(wait_queue_head_t *queue) (and an interruptible version). Make sure this is in a different process or interrupt to call wake_up. wake_up calls all processes on the wait queue. wake_up_interruptible wakes up only interruptible processes.
Could use wake_up to wake_up reader queue when new data is availble due to write. Could then wake up writer_queue because new space is available.
Example in scull's pipe.c with waiting/wake_up . This is blocking i/o!
Output and input buffers are often useful for handling blocking i/o on real devices. This means any blocking is on access to the buffer rather than access to the device. Benefit when memory access is faster than device access.
How a process sleeps: TASK_RUNNING - able to run, may not be executing. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE two types of sleep
schedule() actually calls the scheduler. You can see this within the wait_event macro! This is what actually puts the function to sleep. The schedule() function will return periodically, and allow your driver code (via macro wait_event_interruptible or similar) to check for a condition.
What if a process requests O_NONBLOCK on open()? This is included in the filp->f_flags (within file pointer.) It will return EAGAIN, it won't wait! Non block test script is available here. C source code is here.

Poll and select

poll and select determine whether a device/file can be read without blocking or wait for a file descriptor to become ready. Support implemented by poll method in device driver. Implements the poll driver in user space. Within file pointer! __poll_t (*poll) (struct file *), struct poll_table_struct *);
Call poll_wait on any wait queues which could indicate poll status changes. Kernel will wait on these as neccessary. Returns bit mask of currently available operations. Same ideas as wait queues. Example in pipe.c is here
Read Rules: when data is avaialble in input buffer, read should return immediately with at least 1 byte. Poll should return POLLIN|POLLRDNORM If no data in input buffer read should block until one byte is there OR if O_NONBLOCK return with value -EAGAIN. poll should report unreadable (read flags all 0). At end of file, read should return immediately with value of 0, poll should report POLLHUP.
Write Rules: when space is available in the output buffer, write should return without delay, accepting at least one byte. poll should report POLLOUT | POLLWRNORM. When the output buffer is full, write should block until space is freed or if O_NONBLOCK is set return -EAGAIN. poll should report file is not writeable (write flags are 0). Never make a write call wait for data transmission (transfer from output buffer to device) even if O_NONBLOCK is not set. driver must provide fsync.
Seeking on a device: default lseek just sets filp->f_pos. llseek file operation can be implemented if the seek needs a custom operation for the device. Call nonseekable_open in your open function if seek doesn't make sense for your device. Somethinkg with data flow like a serial port/keyboard.
Access Control from open(). Single-Open - only one process can open at a time. scullsingle example, obtain atomic in open, release in release() Single User, compare process uid in open(), return -EBUSY when in use. Alternative to returning -EBUSY - block in open()