K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009.

Slides:



Advertisements
Similar presentations
DEVICE DRIVER VINOD KAMATH CS691X PROJECT WORK. Introduction How to write/install device drivers Systems, Kernel Programming Character, Block and Network.
Advertisements

Linux device-driver issues
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
RT_FIFO, Device driver.
Lecture for Lab 3, Exp1 of EE505 (Developing Device Driver) T.A. Chulmin Kim CoreLab. Mar, 11, 2011 [XenSchedulerPaper_Hotcloud-commits] r21 - /
Computer System Laboratory
Processes Management.
USERSPACE I/O Reporter: R 張凱富.
R4 Dynamically loading processes. Overview R4 is closely related to R3, much of what you have written for R3 applies to R4 In R3, we executed procedures.
CSCC69: Operating Systems
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Memory management.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
63 UQC152H3 Advanced OS Writing a Device Driver. 64 The SCULL Device Driver Simple Character Utility for Loading Localities 6 devices types –Scull-03.
Introduction to Kernel
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Large Scale File Distribution Troy Raeder & Tanya Peters.
CS-502 Fall 2006Processes in Unix, Linux, & Windows 1 Processes in Unix, Linux, and Windows CS502 Operating Systems.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Tutorial and Demos on Linux Virtual Machine
Processes in Unix, Linux, and Windows CS-502 Fall Processes in Unix, Linux, and Windows CS502 Operating Systems (Slides include materials from Operating.
Dynamic Allocation and Linked Lists. Dynamic memory allocation in C C uses the functions malloc() and free() to implement dynamic allocation. malloc is.
Cyclic Scheduling –Advantages Simple implementation (no real-time operating system is required). Low run-time overhead. It allows jitter control. –Disadvantages.
EECS 498 Advanced Embedded Systems Lecture 4: Linux device drivers and loadable kernel modules.
CprE 458/558: Real-Time Systems (G. Manimaran)1 RTLinux Lab – Introduction Cpre 558 Anil
Data Structures in the Kernel Sarah Diesburg COP 5641.
Loadable Kernel Modules Dzintars Lepešs The University of Latvia.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Operating System Program 5 I/O System DMA Device Driver.
With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Logo RTAI & LTT Choi Sung Chul. What is the RTAI ?  Realtime Application Interface  A patch to the Linux kernel which introduces a hardware abstraction.
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 15: More-Advanced Concepts.
Kernel Modules. Kernel Module Pieces of code that can be loaded and unloaded into the kernel upon demand. Compiled as an independent program With appropriate.
UNIX Files File organization and a few primitives.
Thread Implementations; MUTEX Reference on thread implementation –text: Tanenbaum ch. 2.2 Reference on mutual exclusion (MUTEX) –text: Tanenbaum ch
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Linux Device Driver 2009/04/08. Reference Book Another Reference Book Embedded Linux Primer: A Practical, Real-World Approach By Christopher Hallinan.
Interfacing Device Drivers with the Kernel
CSE 466 – Fall Introduction - 1 User / Kernel Space Physical Memory mem mapped I/O kernel code user pages user code GPLR virtual kernel C
Lab 12 Department of Computer Science and Information Engineering National Taiwan University Lab12 – Driver 2014/12/16 1 /21.
COMP 3438 – Part I - Lecture 5 Character Device Drivers
Kernel Structure and Infrastructure David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
Finish up OS topics Group plans. Today Finish up and review Linux device driver stuff – Walk example again – See how it all goes together – Discuss talking.
Lecture 3 Module Programming and Device Driver (Homework#1 included) Kyu Ho Park Sept. 15, 2015.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
A simple control application with Real Time Linux Peter Wurmsdobler Real Time Systems Lab Jong-Koo, Lim Paper Survey.
Embedded Real-Time Systems
Introduction to Kernel
Computer System Laboratory
Chapter 5 Conclusion CIS 61.
Linux Kernel Driver.
Want to play a game? – Linux Kernel Modules
Semester Review Chris Gill CSE 422S - Operating Systems Organization
Example questions… Can a shell kill itself? Can a shell within a shell kill the parent shell? What happens to background processes when you exit from.
Operation System Program 4
Crash course in R – short introduction
CS703 - Advanced Operating Systems
Thread Implementations; MUTEX
Kernel Structure and Infrastructure
Shared Memory Programming
Thread Implementations; MUTEX
Loadable Kernel Modules
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
Chapter 3: Process Concept
CSE 153 Design of Operating Systems Winter 2019
COMP755 Advanced Operating Systems
C Programming Lecture-17 Storage Classes
Presentation transcript:

K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009

C OMPARE TIMING AND JITTER BETWEEN A REALTIME MODULE AND NON - REALTIME MODULE Are the results of using a realtime module worth the effort of installing RTAI? What is the timing difference between realtime and non- realtime kernel modules for computation? What is the jitter difference between realtime and non- realtime kernel modules for computation? What is the jitter difference between realtime and non- realtime kernel modules for overall process time, with and without MPI? What types of tasks are improved by using RTAI?

W HAT IS THE TIMING DIFFERENCE BETWEEN REALTIME AND NON - REALTIME KERNEL MODULES ?

O VERALL PROCESS T IME COMPARISON FOR 8 NODES

W HAT IS THE JITTER DIFFERENCE BETWEEN REALTIME AND NON - REALTIME KERNEL MODULES FOR OVERALL PROCESS TIME ?

S OURCE C ODE W RITTEN Kernel Module implementing a char device read/write as a signal to perform the kernel task. Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task. Programs to use the kernel modules. MPI Programs to use the kernel modules. Scripts to build and load both modules. Scripts to run programs and save results. Scripts to initiate MPI on all nodes (because mpdboot is retarded and doesn’t work for 8 nodes)

C HARACTER D EVICE D RIVER – R EAD FUNCTION ///read ssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) { int a[20][20],b[20][20],c[20][20]; int i,j,k,extraloop,t2; RTIME t0, t1; t0 = rt_get_cpu_time_ns(); //50000 iterations for a good measurement for (extraloop=0; extraloop< 50000; extraloop++) { // Matrix calculation block for (k=0; k< 20; k++) for (i=0; i< 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); t2 = (int) (t1-t0); // Changing reading position as best suits //copy_to_user(buf,mmmodule_buffer,1); return t2; }

C HARACTER D EVICE D RIVER - SETUP // memory character device driver to do matrix // multiply upon a call to it #include #include // printk() #include // kmalloc() #include // everything #include // error codes #include // size_t #include #include // O_ACCMODE #include // cli(), *_flags #include MODULE_LICENSE("GPL"); // Declaration of mmmodule.c functions int mmmodule_open(struct inode *inode, struct file *filp); int mmmodule_release(struct inode *inode, struct file *filp); ssize_t mmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos); void mmmodule_exit(void); int mmmodule_init(void); /* Structure that declares the usual file */ /* access functions */ struct file_operations mmmodule_fops = { read: mmmodule_mmmdo, //write: mmmodule_write, open: mmmodule_open, release: mmmodule_release }; // Declaration of the init and exit functions module_init(mmmodule_init); module_exit(mmmodule_exit); // Global variables of the driver int mmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data int mmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops); if (result < 0) { printk("mmmodule: cannot get major number %d\n", mmmodule_major); return result; } // Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL); if (!mmmodule_buffer) { result = -ENOMEM; goto fail; } memset(mmmodule_buffer, 0, 1); printk("Inserting mmmodule module\n"); return 0; fail: mmmodule_exit(); return result; }

R EAL T IME M ODULE - R EAD static int myfifo_handler(unsigned int fifo) { rt_sem_signal(&myfifo_sem); return 0; } static void Myfifo_Read(long t) { int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1; while (1) { //rt_printk("new_shm: sem_waiting\n"); rt_sem_wait(&myfifo_sem); rtf_get(Myfifo, &ch, 1); //rt_printk("got a char off the fifo... time to do matrix mult\n"); t0 = rt_get_cpu_time_ns(); //rt_printk("t0= %ld \n",t0); for (xj=0; xj < 50000; xj++) { for (k=0; k < 20; k++) for (i=0; i < 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } t1 = rt_get_cpu_time_ns(); shm->t2 = t1-t0; // = (int *)t2; }}

R EAL T IME M ODULE - SETUP static RT_TASK read; #define TICK_PERIOD LL /* 0.1 msec ( 1 tick) */ int init_module (void) { // shared memory section rt_printk("shm_rt.ko initialized: tick period = %ld\n", TICK_PERIOD); shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ); if (shm == NULL) return -ENOMEM; memset(shm, 0, SHMSIZ); rtf_create(Myfifo, 1000); rtf_create_handler(Myfifo, myfifo_handler); rt_sem_init(&sync, 0); rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE); rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0); start_rt_timer((int)nano2count(TICK_PERIOD)); rt_task_resume(&read); return 0; }

C ONCLUSIONS There are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks. Accessing shared memory created using RTAI sadly slows the module to a ‘crawl’. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved. I don’t think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-user For MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.

L ESSONS LEARNED A kernel crash writes core dumps on all open windows. A small tick-period locks up the whole machine and is unrecoverable. Fifos and semaphores work nicely and do not create race conditions. Character device drivers work nicely but are a little more maintenance to set up and program. These are my first modules ever written, including the rt one for the conference. A profiler would be very useful for comparing performance instead of graphs and text. I will soon be writing an RT module to read a synchro device every 12 milliseconds to try out the deterministic- ness of RTAI Nanoseconds = 1 Micosecond 1 Microsecond = 1000 Millisecond 1 Millisecond = Nanoseconds

F UTURE WORK There is very little work or code examples (findable by Google, anyway) done with RTAI The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI. Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI. Plan to use RTAI for its intended purpose of being deterministic. Write stuff about things for a paper.

C107 8 N ODE CLUSTER SETUP WITH CENTOS 5.3, RTAI AND MPICH 2