Download presentation
Presentation is loading. Please wait.
Published byGarey Pierce Modified over 9 years ago
1
K ERNEL D EVELOPMENT CSC585 Class Project Dawn Nelson December 2009
2
C OMPARE TIMING AND JITTER BETWEEN A REALTIME MODULE AND NON - REALTIME MODULE Are the results of using a realtime module worth the effort of installing RTAI? What is the timing difference between realtime and non- realtime kernel modules for computation? What is the jitter difference between realtime and non- realtime kernel modules for computation? What is the jitter difference between realtime and non- realtime kernel modules for overall process time, with and without MPI? What types of tasks are improved by using RTAI?
3
W HAT IS THE TIMING DIFFERENCE BETWEEN REALTIME AND NON - REALTIME KERNEL MODULES ?
4
O VERALL PROCESS T IME COMPARISON FOR 8 NODES
5
W HAT IS THE JITTER DIFFERENCE BETWEEN REALTIME AND NON - REALTIME KERNEL MODULES FOR OVERALL PROCESS TIME ?
6
S OURCE C ODE W RITTEN Kernel Module implementing a char device read/write as a signal to perform the kernel task. Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task. Programs to use the kernel modules. MPI Programs to use the kernel modules. Scripts to build and load both modules. Scripts to run programs and save results. Scripts to initiate MPI on all nodes (because mpdboot is retarded and doesn’t work for 8 nodes)
7
C HARACTER D EVICE D RIVER – R EAD FUNCTION ///read ssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) { int a[20][20],b[20][20],c[20][20]; int i,j,k,extraloop,t2; RTIME t0, t1; t0 = rt_get_cpu_time_ns(); //50000 iterations for a good measurement for (extraloop=0; extraloop< 50000; extraloop++) { // Matrix calculation block for (k=0; k< 20; k++) for (i=0; i< 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); t2 = (int) (t1-t0); // Changing reading position as best suits //copy_to_user(buf,mmmodule_buffer,1); return t2; }
8
C HARACTER D EVICE D RIVER - SETUP // memory character device driver to do matrix // multiply upon a call to it #include #include // printk() #include // kmalloc() #include // everything #include // error codes #include // size_t #include #include // O_ACCMODE #include // cli(), *_flags #include MODULE_LICENSE("GPL"); // Declaration of mmmodule.c functions int mmmodule_open(struct inode *inode, struct file *filp); int mmmodule_release(struct inode *inode, struct file *filp); ssize_t mmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos); void mmmodule_exit(void); int mmmodule_init(void); /* Structure that declares the usual file */ /* access functions */ struct file_operations mmmodule_fops = { read: mmmodule_mmmdo, //write: mmmodule_write, open: mmmodule_open, release: mmmodule_release }; // Declaration of the init and exit functions module_init(mmmodule_init); module_exit(mmmodule_exit); // Global variables of the driver int mmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data int mmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops); if (result < 0) { printk("mmmodule: cannot get major number %d\n", mmmodule_major); return result; } // Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL); if (!mmmodule_buffer) { result = -ENOMEM; goto fail; } memset(mmmodule_buffer, 0, 1); printk("Inserting mmmodule module\n"); return 0; fail: mmmodule_exit(); return result; }
9
R EAL T IME M ODULE - R EAD static int myfifo_handler(unsigned int fifo) { rt_sem_signal(&myfifo_sem); return 0; } static void Myfifo_Read(long t) { int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1; while (1) { //rt_printk("new_shm: sem_waiting\n"); rt_sem_wait(&myfifo_sem); rtf_get(Myfifo, &ch, 1); //rt_printk("got a char off the fifo... time to do matrix mult\n"); t0 = rt_get_cpu_time_ns(); //rt_printk("t0= %ld \n",t0); for (xj=0; xj < 50000; xj++) { for (k=0; k < 20; k++) for (i=0; i < 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } t1 = rt_get_cpu_time_ns(); shm->t2 = t1-t0; // = (int *)t2; }}
10
R EAL T IME M ODULE - SETUP static RT_TASK read; #define TICK_PERIOD 100000LL /* 0.1 msec ( 1 tick) */ int init_module (void) { // shared memory section rt_printk("shm_rt.ko initialized: tick period = %ld\n", TICK_PERIOD); shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ); if (shm == NULL) return -ENOMEM; memset(shm, 0, SHMSIZ); rtf_create(Myfifo, 1000); rtf_create_handler(Myfifo, myfifo_handler); rt_sem_init(&sync, 0); rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE); rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0); start_rt_timer((int)nano2count(TICK_PERIOD)); rt_task_resume(&read); return 0; }
11
C ONCLUSIONS There are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks. Accessing shared memory created using RTAI sadly slows the module to a ‘crawl’. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved. I don’t think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-user For MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.
12
L ESSONS LEARNED A kernel crash writes core dumps on all open windows. A small tick-period locks up the whole machine and is unrecoverable. Fifos and semaphores work nicely and do not create race conditions. Character device drivers work nicely but are a little more maintenance to set up and program. These are my first modules ever written, including the rt one for the conference. A profiler would be very useful for comparing performance instead of graphs and text. I will soon be writing an RT module to read a synchro device every 12 milliseconds to try out the deterministic- ness of RTAI. 1000 Nanoseconds = 1 Micosecond 1 Microsecond = 1000 Millisecond 1 Millisecond = 1000000 Nanoseconds
13
F UTURE WORK There is very little work or code examples (findable by Google, anyway) done with RTAI The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI. Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI. Plan to use RTAI for its intended purpose of being deterministic. Write stuff about things for a paper.
14
C107 8 N ODE CLUSTER SETUP WITH CENTOS 5.3, RTAI AND MPICH 2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.