Crashing.. but gracefully

Crashing is always a fun topic with OSes. You have the iconic blue screen of death in Windows, persisting since about the OS release (for the record, a sad face and a useless error code barely helps unlike the old BSOD, Microsoft). Linux, while rare, does have crashes/kernel panics. I’d recommend checking the xscreensaver BSOD module to see a massive collection of crashes.

Anyways, onto the topic. When developing an operating system, handling crashes is an essential part of an operating system – especially for debugging, but also for UX (user experience).

Imagine if a user gets to see this when they start up:

Windows 1.01 BSOD (Caused by an Incorrect DOS version)

You’ve got some incredible information in there – I’m sure anyone who can extrapolate hex data from Unicode could figure something out. Jokes aside, this was caused by the lack of real “crash-handling” in 16-bit real mode. It’s a hard problem to solve.

As well as that, debugging is a core part of developing an OS. Stack traces, register dumps, memory to disk dumps, etc are all very important parts of OS development.

In this article, I’ll be discussing how I wrote my operating system’s kernel panic driver, the flaws it has, and how I plan to fix them

Design of my panic screen

As of April 13th, this is what the current reduceOS panic screen looks like:

A familiar screen for anyone who has tried to use reduceOS

Actually, that’s just one variation of the panic screen. There are actually 2 more – here’s a demo of one of them and an explanation of why I can’t do the other:

ISR kernel panic

reduceOS is in a.. strange.. state right now. To sum it up, the paging driver cannot run properly, and when I tried to jury rig it to just page fault, it did not work. But I got another funny error screen out of it

The structure of my driver

reduceOS uses a kernel panic driver that can be called in two ways:

  1. Kernel Exception
  2. ISR-based exception

Let’s start with the simplest – kernel exceptions. Kernel exceptions are triggered by my code, usually when a problem is detected and continuing would result in further errors. In reduceOS, they are called using panic(char *caller, char *func, char *reason). These are much easier to debug, since they give clear reasons that I wrote myself.

As well as that, kernel exceptions (as shown in the screenshot) give register values. Not anymore. When writing that code, I was very new to OS development and didn’t exactly realize that the CPUID instruction did not output register content. I am quite embarrassed now, but I will write some inline assembly to rectify this issue.

Moving on from that, ISR exceptions. In my code, the ISR (Interrupt Service Routine) driver handles IRQ exceptions like so:

  1. Check if an exception handler has been registered
  2. If the interrupt number < 32 and there is no handler, call panic (<32 means an exception).

The CPU will pass register data to the ISR handler, and in turn it will handle the code. The panic function outputs these registers, along with a basic not currently working stack trace (stack traces are designed to help find the problem instruction by backtracing through other code).

Overall, while UX is a core component, debugging is also a critical component. I feel as though I’ve done a good job of mixing the two together, and making it a little easier to debug, while also making sure if anyone uses the OS, it’s not that bad of an experience.

Bonus Content: Crazy Failures

Debugging is fun.. right?

Please include the following text.” (sector failure – linked kernel to 0x1100 but loaded in at 0x1000)

Seizure, as an art (still have no idea wtf caused this one)

Leave a Comment