Project 2: Non-Preemptive Kernel

Overview

With the bootup mechanism from the first project, we can start developing an operating system kernel. The main goal of this project is to extend the provided kernel to support multiprogramming. More specifically, you will:
Although the kernel will be very primitive, you will be dealing with several core mechanisms and applying techniques that are important for building more advanced, multiprogrammed operating systems in the future.

The kernel of this project and future projects supports both processes and threads. Note that the processes in this project are different from those in traditional operating systems (like Unix). Our processes have their own protection domains, but they share a flat address space. In the future, we will be providing processes with separate address spaces. Our threads are kernel level threads that always share the same address space with the kernel; they will never run in user mode. The kernel threads are linked together with the kernel, subsequently they can access kernel data and functions directly. On the other hand, processes use the single entry-point mechanism described in the lecture.

Students have traditionally found this project to be more complex than the the first, so please begin now and work at a steady pace.

Kelvin Zou (
xuanz@cs.princeton.edu) is the TA in charge of this project.

Design Review

Please cover these points at your design review:
  1. Process Control Block: What will be in your PCB and what will it be initialized to?
  2. Context Switching: How will you save and restore a task's context? Should anything special be done for the first task?
  3. Processes: What, if any, are the differences between threads and processes and how they are handled?
  4. Mutual exclusion: What's your plan for implementing mutual exclusion?
  5. Scheduling: Assuming that the threads below are put on the ready queue in order of their names and execution starts at Thread 1, how will execution unfold?
Thread 1
lock_init(&lock);
lock_acquire(&lock);
do_yield();
lock_release(&lock);
do_exit();
Thread 2
while(1)
{
    do_yield();
}
Thread 3
do_yield();
lock_acquire(&lock);
lock_release(&lock);
do_exit();
Thread 4
lock_acquire(&lock);
lock_release(&lock);
do_exit();
Use the signup sheet to schedule a time to see the TA in the fishbowl.

Getting Started

Use the start code which you can get from Blackboard or the fishbowl file server. Compiling this project is tricky, so we have provided the Makefile. If you type make (or use the all target) it will compile all sources and prepare an image, floppy.img, for use with bochs. We've also supplied a configuration file, bochsrc, that instructs bochs to use this image as a floppy. A make target, boot, is provided in the Makefile to load the image onto your USB stick if you'd like to test on a real machine, but make sure it is cat'ing to the correct path before using.

Kernel Threads and Scheduling

The kernel proper consists of a collection of kernel threads. These threads run in the most privileged protection ring, and their code is part of the kernel loaded by the boot block. Kernel threads share a flat address space, but they do not share register values or stacks.

After the boot block has loaded the kernel, it calls the function kernel.c:_start(). This function performs the setup necessary to run the tasks listed in tasks.c by initializing process control blocks (PCBs) for each one. You are encouraged to allocate space for the PCBs statically with pcb_t pcbs[NUM_TASKS]. Make sure to loop on NUM_TASKS when setting up PCBs as the number of tasks may be different during testing.

Threads only run in one context: the kernel. Processes have two contexts: user and kernel. You have an option in this project to decide how you want to deal with this. One option is to make room in your PCB for both contexts. The other option is to only leave room for one context and lose any kernel context for each process after crossing the user-kernel boundary. This is ok because there is not state that needs to be kept within the kernel for processes. However, if you choose to explicity save both, it is ok to "waste" space in the PCB for threads that won't user the user context area.

Stacks are STACK_SIZE bytes big and are allocated contiguously at increasing addresses starting with STACK_MIN. You may use the macro common.h:ASSERT([assertion]) to enforce the invariant that all stacks lie in the interval STACK_MIN exclusive to STACK_MAX inclusive.

Since you are writing a non-preemptive scheduler, kernel threads run without interruption until they yield or exit by calling scheduler.c:do_yield() or scheduler.c:do_exit(). These functions in turn call entry.S:scheduler_entry(), which saves the contents of the general purpose and flags registers in the PCB of the task that was just running. scheduler.c:scheduler() is then called to choose the next task, after which the next task's registers are restored, and the task is ret'd to. scheduler.c:scheduler() chooses the next task to run in a strict round-robin fashion, and in the first round, the tasks are run in the order in which they are specified in tasks.c.

Please make sure that if all the tasks have exited (in case clock_thread is not running, for example) your kernel doesn't crash. You could loop in the scheduler printing out the fact that there are no more tasks to run.

scheduler.c:scheduler() increments the global variable scheduler.c:scheduler_count every time it runs. Do not change this.

Processes and System Calls

In future projects, processes will have their own protected address spaces and run in a less privileged protection ring than the kernel. For now, however, there are only slight differences between threads and processes. If you choose to save both user and kernel contexts, processes will have two stacks: one user stack, for the process proper, and one kernel stack, for the yield and exit system calls made by the process. Moreover, process code is not linked with the kernel, thus the need for the single entry-point mechanism (see how syslib.S:kernel_entry is called).

There are two system calls: syslib.S:yield() and syslib.S:exit(), which processes call. These correspond in principle to the calls do_yield() and do_exit() that threads use. Processes cannot invoke do_yield() and do_exit() directly though because they don't know the addresses of these functions.

This project uses a dispatch mechanism to overcome the difficulty of processes not knowing the addresses of do_yield() and do_exit(). kernel.h defines a spot in memory called ENTRY_POINT at which _start() writes the address of the function label kernel_entry (in entry.S) at run-time. When a process calls yield() or exit() (in syslib.S), SYSCALL(i) (in the same file) gets called with an argument that represents which system call the caller wants (yield or exit), similar to how a number was stored in %ah before initiating a BIOS interrupt. SYSCALL(i) in turn calls kernel_entry, whose address was saved at ENTRY_POINT. kernel_entry saves the process's context to its PCB and then transfers control to either of the previously "unreachable" do_yield() or do_exit() functions. Like threads, something must restore the processes' context after it is chosen to run again by scheduler.

Task Saving Call Graphs

The call graph for task saving can seem a little complicated, so I've made the following ASCII call graph diagrams:

Threads

do_yield (scheduler.c) -> scheduler_entry (entry.S) -> scheduler (scheduler.c)
do_exit  (scheduler.c) ->           "                           "

Processes

yield (syslib.S) -> kernel_entry (entry.S) -> do_yield (scheduler.c)
exit  (syslib.S) ->            "           -> do_exit  (scheduler.c)

Mutual Exclusion

lock.c contains three functions for use by kernel threads: lock_init(l), lock_acquire(l), and lock_release(l). lock_init(l) initializes lock l; initially no thread holds the specified lock. In the event an uninitialized lock structure is passed to the other functions, the results are up to you. lock_acquire(l) must not block the thread that called it when no other thread is holding l.

Threads call lock_acquire(l) to acquire lock l. If no thread currently holds l, it marks the lock as being held and returns. Otherwise it uses scheduler.c:block(...) to indicate to the scheduler that the thread is now waiting for the lock and should be put on the blocked queue. If there are one or more threads on the blocked queue when lock_release(l) is called by the thread holding the lock, the lock remains locked and the thread at the head of the blocked queue is put on the ready queue at the back, through a regular enqueue operation. If no threads are waiting, it marks the lock as being not held and returns. The results of a thread attempting to acquire a lock it already holds or to release a lock it doesn't hold are up to you.

We have put a non-conforming implementation of mutual exclusion in lock.c so that you can write and test the kernel threads part in isolation. When you are ready to test your own implementation, change the constant SPIN in lock.c to FALSE.

Note: your lecture notes describe how to synchronize access to the fields associated with a lock. Since there is no preemption, if lock_acquire(l) and block() don't call do_yield() in the middle of manipulating data structures, you don't need to worry about race conditions while these functions do their job. In particular, it is not necessary to protect the scheduler data structures with a test-and-set lock.

Timing do_yield and yield

The function util.c:get_timer() returns the number of instructions that have been executed since boot. Using util.c:print_int() and util.c:print_str(), use the first line of the display to show the number of cycles required by do_yield and yield using thread4 and thread5 in th3.c and _start in process3.c respectively. Do not time how long do_exit, exit, if statements, print_str, or print_int take.

The measurement tasks should only print out their measurements once. The numbers do not need to be stated in the README.

Extra Credit

For extra credit, write a fair scheduler. This means that the scheduler always chooses as the next task to run the one that has had the least amount of time on the CPU thus far. Do not include the time it takes to do a context switch when keeping track of run time thus far. Prove that your scheduler is fair by writing tasks that show execution time on the CPU is fair to within the longest interval between yields.

Provide a way to turn the extra credit on and off and tell how that is done in your README. Leave the extra credit off by default.

Hints

(Borrowed from previous years. Hints will be updated as questions come out from office hours / design reviews.)

Grading Rubric

Description Points
Design review 5
OS can start a single task 2
do_yield and yield save and restore contexts correctly 3
Without locks, scheduler obeys FIFO order 1
OS can unfold execution for the 4 threads in design review correctly 2
Your lock can guarantee mutual exclusion and FIFO order for locks (that is, the thread acquire()ing the lock earlier gets it earlier) 2
th3.c and process3.c time JUST how long a do_yield or yield takes, respectively 1
OS can successfully load and run all tasks provided in the start code 4
Total 20
Extra Credit 1


Submission

Don't forget to complete the general requirements for every project.

Submit via Dropbox.


Glossary

address line 20
When PCs boot up, addresses 0x100000 to 0x1fffff are aliases for 0x00000 to 0xfffff to accommodate badly written real mode programs. Programs that use more than one megabyte of memory must disable this behavior. See http://www.win.tue.nl/~aeb/linux/kbd/A20.html.
kernel mode
Synonym for protection ring 0.
process control block
The process control block (PCB) contains all of the state the kernel needs to keep track of a process or kernel thread.
protected mode
In (16-bit) real mode, an address ds:si refers to byte 16*ds + si. In 32-bit protected mode, the segment registers contain offsets into the global descriptor table, whose entries control the translation of memory accesses into physical addresses. The boot block provided to you sets up the global descriptor table and segment registers so that no translation is applied; you can ignore the segment registers for this project. See http://www.internals.com/articles/protmode/introduction.htm.
protection ring
Intel processors since the 286 have four privilege levels, ring 0 through ring 3. Ring 0 is the most privileged; ring 3 is the least. Typically the kernel runs in ring 0, and user processes run in ring 3. For this project, all processes run in ring 0.
system call
In modern operating systems, processes are enjoined from accessing memory that they do not own. System calls are how processes communicate with the kernel. Typically, the process writes the system call number and its arguments in its own memory, and then it transfers control to the kernel with a special kind of far call. The kernel then performs the desired action and returns control to the process.
task
Following Linux, we use "task" to mean process or kernel thread.