Linux Project 中央大學資工系 碩士二年級 江瑞敏
Outline How to compile linux kernel How to add a new system call Some Projects Example and Way to Solve it – System Call Hooking by Module – Project about Memory – Project about Process
Download Link wget tar xvf linux tar.bz2
The Beginning of everything
Compile Linux Kernel
It is Hard?
No, If you understand the concept
The Basic Process 0. make mrproper 1. make oldconfig 2. make –j[n] 3. make modules_install 4. make install 5. reboot
Do You Know What It Means?
make mrproer Clean up the environment Will Remove almost everything, except….
make clean Almost the same as make mrproper.
make oldconfig Use the configuration file the current kernel is using. Some other alternative options. – Make menuconfig – …
Is config File Important?
Config file Determine which kind of kernel you are compiling Determine which modules you want the kernel to compile. Misconfiguration will lead to kernel crash.
make –j[n] Compile the whole source code according to your configuration
make modules_install Install the modules into the necessary folder. – /lib/modules/`uname –r`/
make install Install the image into the boot directory. Sometimes, update grub is necessary.
What Is System Call
It’s a Bridge
Between User Device
Why System Call
Pop Quiz : Write A Program To Print “Hello World”
What You May Write
What Actually Happened ….
User Application Kernel Code System Call System Call libc.so Printf Device Driver IO Device
What If There Is No System Call
Everything Will Be x86 instruction in and out
Let’s Focus On … User Application Kernel Code System Call System Call libc.so Printf Device Driver IO Device
Magic int 0x80
Before We Talk Further, Let’s Talk About X86 Architecture
X86 Architecture Is Interrupt Driven
CPU 8259 PIC Kernel Device User Application Device Device Driver
How The CPU Find The Address of The Device Driver Code
Callback Mechanism
CPU 8259 PIC Device Device Driver … … Interrupt Descriptor Table ….. Kernel Physical Device
How About System Call
Magic int 0x80
CPU 8259 PIC Device Physical Device syscall_table Interrupt Descriptor Table ….. 0x80 ….. System Call Handler int 0x80
CPU Kernel User Application int 0x80 cs ds ss esp eip … … … … Stack cpu
CPU User Application int 0x80 GDT Get TSS TSS cs ds ss esp eip … … … … Stack cpu
CPU User Application int 0x80 GDT Get TSS TSS cs ds ss esp eip … … … … Stack cpu
CPU User Application int 0x80 IDT Get IDT cs ds ss esp eip … … … … Stack 0x80 ENTRY(system_call) cpu sys_call_table
CPU User Application int 0x80 IDT Get IDT cs ds ss esp eip … … … … Stack 0x80 ENTRY(system_call) cpu sys_call_table
CPU User Application int 0x80 IDT Get IDT cs ds ss esp eip … … ss esp eflags cs eip … … Stack 0x80 ENTRY(system_call) sys_call_table cpu
How To Add A System Call
Add a System Call 1. cd $kernel_src 2. cd arch/i386/kernel/syscall_table.S 3. …..long sys_tee /* 315 */.long sys_vmsplice.long sys_move_pages.long sys_project /* 318 */ Kernel.org/pub/linux/kernel
Add a System Call cd linux/include/asm-i386/unistd.h #define __NR_vmsplice 316 #define __NR_move_pages 317 #define __NR_project 318 #ifdef __KERNEL__ #define NR_syscalls 319
Add a System Call cd linux/include/linux/syscalls.h asmlinkage long sys_set_robust_list(struct robust_list_head __user *head, size_t len); asmlinkage long sys_project( int i ); #endif
Add a System Call cd linux/kernel touch project.c Makefile obj-y = project.o sched.o fork.o exec_domain.o panic.o printk.o profile.o
Add a System Call Project.c #include #include asmlinkage long sys_project( int i ){ printk( "Success!! -- %d\n", i ); return 0; }
Add a System Call Recompile linux kernel Reboot Create a new file “test.c” #include int main(){ syscall( 318, 2 ); return 0; }
Add a System call ALL2007/syscall.html ALL2007/syscall.html
About 64 bits The Idea is the same There are many online references Therefore, I will not cover in this ppt.
System Call Hooking by Module
System Call Hooking 57 … sys_call_table 正常的 execve 程式碼 Usermode 程式呼叫 系統呼叫 NR_execve
System Call Hooking 58 … sys_call_table 正常的 execve 程式碼 Usermode 程式呼叫 系統呼叫 NR_execve Hooking Code
System Call Hooking 59 … sys_call_table Usermode 程式呼叫 系統呼叫 NR_execve Hooking Code 正常的 execve 程式碼 Modified execve
Source code links
Project about Memory
Level 1: Dump the virtual address of a process
Some Question U may Ask
Where to Start?
Maybe Add a New System Call
1. How to find the process you want?
Process List task_struct for_each_process() If u pay attention in class, these two are not stranger.
2. How about Virtual Address that is being used by the current process?
The Data Structure mm_struct vm_area_struct lxr.linux.no
How it looks like
The rest is some basic programming skill
Too easy, Let’s make it a little bit harder
Level 2: Dump the physical frame that is associate with the virtual address.
New Problem, New question
How to transfer Virtual Address to Physical Address?
Some Reminder and Hints
Where is CR3?
Now We Have CR3, Then?
Calculate By Yourself or
Something Smarter
follow_page()
Push Yourself More
Level 3: Log these information to a file
Ok, let’s type
dmesg || grep “myproject” >> log.txt
Dude Are you… Dude Are you…
…. From Kernel of course
Can We Do That???
How to write file in User Mode
fd = open(filename, “w”); write(ptr, string, strlen(string)); close(fd);
How about Kernel Mode
open -> do_sys_open
Write -> sys_write()
Close -> sys_close()
Is that all?
The magic __user
It tell kernel that the parameter should pass from user mode
It’s a protection mechanism
Final Step About this Project
Level 4: Modify The PTE r/w flag from read/write to read
L2012/linux_project1.html
Structures of Page Directories And Page Tables Entries
Wow, Looks Simple :D
Basic Idea
1. loop through the translation table of a process according to the virtual address. 2. After finding the pte, change the read/write flag 3. Done
pte_wrprotect() Code Implement
for(loop_count = addr; loop_count < end; loop_count+=PAGE_SIZE){ pgd = pgd_offset(mm, loop_count); if (pgd_none(*pgd)){ printk("pgd none happened\n"); continue; } pud = pud_offset(pgd, loop_count); pmd = pmd_offset(pud, loop_count); pte = pte_offset_map_lock(mm, pmd, loop_count, &ptl); if(operation == 1){ *pte = pte_mkwrite(*pte); } else{ *pte = pte_wrprotect(*pte); } Code Implement(Cont. )
Result
What!?
Use Printk to Verify
Printk Tell Us Two Things
1. we have change the pte r/w flag
2. only one entry being change back, other didn’t in most cases.
Magic Happened ?
Now, Imagine you are CPU
What will happened when some process try to access a read only area
Page Fault Happened
The Question Becomes, How Linux Handle Page Fault
U might Ask, What is Page Fault
From CPU point of view
1. present flag of pgd or pte is clear. 2. code running in user mode attempts to write to a read only page. – More detailed check intel programmer manual.
From Kernel Point of View
1. present flag is clear: A. Access the first time. B. Page is being swap out. 2. write to a read only page: A. is a process really write to a read only page B. is a page-fault optimization such as copy on write.
How Does Linux Kernel Determine These Kind of Difference
Well, First….
And This
Then This
What The FxxK…….
This Time Let’s Look Closer
Now We Know An Important Thing
Linux Kernel Will Compare The vm_flag
Some Useful Knowledge
How Linux Implement COW
Cow?? Moo ?
1. COW refer to copy on write 2. google and wiki are your friend 3. how linux implement copy on write. – A. pte r/w flag disable – B. vm_flag & VM_WRITE == true
Our project accidently match the above conditions!
1. same page table entry of parent and child process point to the same pfn 2. set r/w flag of both pte to read only 3. when page fault happened, page fault handler will check the vm_flag of the current virtual address. 4. if vm_flag has VM_WRITE, page fault handler will refer this situation as a COW condition. 5. assign a new pfn with r/w flag enable if there are two pte point to it.
Copy on Write linux implement parent child Task_struct pte Physical address Pfn N Pfn (N+1) Pfn (N+2) pgd
A New Idea of The Project
1. Change PTE r/w flag as we just did
2. Change the vm_flag as well
down_write(¤t->mm->mmap_sem); vma = find_vma(mm, addr); vm_start = vma->vm_start; vm_end = vma->vm_end; mask = VM_READ|VM_WRITE|VM_EXEC|VM_SHARED; new_flags = VM_READ; old_flags = vma->vm_flags; if(old_flags&VM_WRITE){ old_flags &= ~(VM_WRITE); new_flags |= old_flags; } else{ new_flags |= old_flags; } prot = protection_map[new_flags & mask]; vma->vm_flags = new_flags; vma->vm_page_prot = prot; up_write(¤t->mm->mmap_sem); addr &= PAGE_MASK; change_pte(addr, end, operation); Code Implementation
Result
Where is the “press enter to continue” ?
It’s time to use GDB
Set a break point before syscall happened Seems like this time printf cause the error
Here is the problem.
Think Slowly
Calling printf will need to push some parameters
Recall From The Last Code
we have changed vm_flag for the whole vm_area_struct which means the entire block of linear address. Address of the array is not always align to 4kb.
Consider The following Conditions
Start address align End address align
Start Address Align End Address Not Align
Start addr End addrTotal need 3 pages Area problem may occur Test_array low high
Start Address Not Align End Address Align
Start addr End addr Total need 3 pages Area problem may occur low high Test array
Start Address Not Align End Address Not Align
Start addr End addr Area problem may occur Total need 4 pages Test_array low high
Our case The parameter is right here Since the page is RO. low high Assembly code: ….. Call syscall; Push $string; Call printf; Assembly code: ….. Call syscall; Push $string; Call printf;
Rewrite the user mode program. This time use malloc instead of local variable.(Heap instead of stack) Char *test_array; Test_array = (char *)malloc(ARRAY_SIZE) Verify Our Thoughts (Test case 1)
Test Case 1 Result
Char test1[0x2000]; Char test_array[ARRAY_SIZE]; Char test2[0x2000]; This can also bypass the conditions that I just mentioned. Verify Our Thoughts (Test case 2)
Test Case 2 Result Also work~~
1. basically, the idea is the same. – A. change vm_flag – B. change pte r/w flag 2. Some hints: – A. Strongly recommend reading Text Book Chapter 8: Memory Management Chapter 9: Process Address Space – B. code to change vma_flag is in mprotect_fixup().mprotect_fixup() – C. the code to loop through the translation table starts from change_protection(….) -> change_pud(….) -> change_pmd(…..) -> change_pte_range(…..) How About Mprotect.c
Full Source Level 1 and 2 : Level 3: