Crashing.. but gracefully

Crashing is always a fun topic with OSes. You have the iconic blue screen of death in Windows, persisting since about the OS release (for the record, a sad face and a useless error code barely helps unlike the old BSOD, Microsoft). Linux, while rare, does have crashes/kernel panics. I’d recommend checking the xscreensaver BSOD module to see a massive collection of crashes.

Anyways, onto the topic. When developing an operating system, handling crashes is an essential part of an operating system – especially for debugging, but also for UX (user experience).

Imagine if a user gets to see this when they start up:

Windows 1.01 BSOD (Caused by an Incorrect DOS version)

You’ve got some incredible information in there – I’m sure anyone who can extrapolate hex data from Unicode could figure something out. Jokes aside, this was caused by the lack of real “crash-handling” in 16-bit real mode. It’s a hard problem to solve.

As well as that, debugging is a core part of developing an OS. Stack traces, register dumps, memory to disk dumps, etc are all very important parts of OS development.

In this article, I’ll be discussing how I wrote my operating system’s kernel panic driver, the flaws it has, and how I plan to fix them

Design of my panic screen

As of April 13th, this is what the current reduceOS panic screen looks like:

A familiar screen for anyone who has tried to use reduceOS

Actually, that’s just one variation of the panic screen. There are actually 2 more – here’s a demo of one of them and an explanation of why I can’t do the other:

ISR kernel panic

reduceOS is in a.. strange.. state right now. To sum it up, the paging driver cannot run properly, and when I tried to jury rig it to just page fault, it did not work. But I got another funny error screen out of it

The structure of my driver

reduceOS uses a kernel panic driver that can be called in two ways:

Kernel Exception
ISR-based exception

Let’s start with the simplest – kernel exceptions. Kernel exceptions are triggered by my code, usually when a problem is detected and continuing would result in further errors. In reduceOS, they are called using panic(char *caller, char *func, char *reason). These are much easier to debug, since they give clear reasons that I wrote myself.

As well as that, kernel exceptions (as shown in the screenshot) give register values. Not anymore. When writing that code, I was very new to OS development and didn’t exactly realize that the CPUID instruction did not output register content. I am quite embarrassed now, but I will write some inline assembly to rectify this issue.

Moving on from that, ISR exceptions. In my code, the ISR (Interrupt Service Routine) driver handles IRQ exceptions like so:

Check if an exception handler has been registered
If the interrupt number < 32 and there is no handler, call panic (<32 means an exception).

The CPU will pass register data to the ISR handler, and in turn it will handle the code. The panic function outputs these registers, along with a basic not currently working stack trace (stack traces are designed to help find the problem instruction by backtracing through other code).

Overall, while UX is a core component, debugging is also a critical component. I feel as though I’ve done a good job of mixing the two together, and making it a little easier to debug, while also making sure if anyone uses the OS, it’s not that bad of an experience.

Bonus Content: Crazy Failures

Debugging is fun.. right?

“Please include the following text.” (sector failure – linked kernel to 0x1100 but loaded in at 0x1000)

Seizure, as an art (still have no idea wtf caused this one)

Design of my panic screen

The structure of my driver

Bonus Content: Crazy Failures

Leave a Comment Cancel reply