NTHU System Software Lab: The Process Address Space

系統除了管理physical memory之外, 它也會管理user-space process的記憶體, 這部份又被叫作process address space. Linux是一個virtual memory OS, process看見的記憶體都是虛擬過的, 每個process都以為擁有全部的記憶體, 而且單一process所看見的記憶體可能比physical memory還要大.
要用什麼來描述一個process address space? 而且總不可能讓process access所有address吧, 那可用的memory又要用什麼來描述? 怎麼對映到physical address? 後面討論的是這些.

一. Address space

Process address space是process可以讀取或使用的記憶體. 不同的process中, 彼此的memory address是不相關的. 也有所謂的thread, 可以共享彼此的address space.
雖然32位元process可以最多指到4GB的位址, 但process沒有權限存取所有的記憶體. 一個process所能合法存取的記憶體區塊, 又稱作memory area. 透過kernel, process可以在它的address space裡動態增減memory area.
Memory area包含:

text section, a memory map of the executable file’s code.
data section, a memory map of the initialized global variables.
bss section, a memory map of the zero page containing uninitialized global variables.
Others.

已經知道了process address space, 那kernel用什麼來保存它呢? 用memory descriptor: mm_struct. 這個資料結構中保存了所有process address space的資訊, 位於linux/mm_types.h底下, 其中幾個element:

mm_user, 指的是使用這個address space的process個數.
mm_count, primary reference count for the mm_struct. ex. 若有9個thread, 則mm_user=9, mm_count=1. 只有當mm_user=0, mm_count才會變0.
mmap 和 mm_rb 都是用來存memory area的資訊. 不同的是mmap採用linked list, mm_rb採用紅黑樹.

memory descriptor存放在task中process descriptor裡的mm欄. 因此, current->mm 就可以指到現在process的memory descriptor. 函式copy_mm用來複製parent的memory descriptor給child. 透過kernel/fork.c裡的allocate_mm()可以從mm_cache中得到一個mm_struct.

當一個process離開, 會呼叫kernel/exit.c裡的mm_exit(), 當中會呼叫mm_put(), 用來減mm_user. 當mm_user變0, mm_drop()就會被呼叫來減mm_count. 當mm_count變0, 就會呼叫free_mm(), 透過kmem_cache_free()將mm_struct丟回mm_cache.

Kernel thread沒有process address space, 也就沒有memory descriptor, 因此kernel thread的 process descriptor 的mm欄位是NULL. Kernel thread的定義亦是: 沒有user context的process.
Kernel thread 沒有 address space 會有影響嗎? OK的, 因為它不會存取user-space memory. 因為 kernel thread 沒有 user-space pages, 也就不用memory descriptor 和 page tables. 但kernel thread 仍然需要page table等資料, 為了節省記憶體, 以及節省switch的時間, kernel threads 會直接使用前一個process的memory descriptor.
當一個process被schedule到, 它會先load mm所指的address space, 再將active_mm指到新的address space上. 而當kernel thread被schedule到, 它看見mm是NULL則會保留原本load的address space, 然後將active_mm指到前一個process的memory descriptor. 這樣kernel thread 就能使用需要的page tables. Kernel thread只會從process address space中拿出屬於kernel memory的資訊, 並不會存取user-space memory. 而且所有process幫忙存的這個資訊是一樣的.

二. memory area

Kernel如何描述address space中的memroy region呢? 用linux/mm_type.h中的vm_area_struct, 一般也稱作virtual memory area (VMA). 系統把每個memory area都當成一個物件. 每一個物件都有相對應的屬性與操作函式.

VM_READ, VM_WRITE, VM_EXEC 這三個是常見的flag, 用來指定memory area中的pages是可以read, 可以write, 或可以執行的.

三. Page tables.

雖然程式都在virtual memory中執行, 但processor卻會在physical memory上直接操作, 因此當一個virtual memory address被使用, 它會先被轉成physical memory. 而這個轉換機制會透過page table來完成. Page table將virtual address分成幾個片段, 每個片段透過index指向一個table, 而table會指到另一個table或是physical page.
Linux中, page table分為三層. 最上面那層又叫PGD (page global directory), 由一個型態是pgd_t的array構成. 第二層是PMD (page middle directory), 由pmd_t的array構成. 第三層被簡稱page table, 由型態為pte_t的page table entry構成.
Page table跟架構有關, 被定義在asm/page.h

幾乎每次操作virtual address都要透過page table來轉換, 因此效能是一個關鍵. 大部份的processor都使用TLB (translation lookaside buffer) 來加速這個過程, 它的作法即是將virtual-to-physical mapping 暫存起來. 當一個virtual address被存取, processor會先到TLB裡看有沒有hit, 若有則立即回傳physical address, 若沒有再透過page table來查physical address.

NTHU System Software Lab