The Kernel Trilogy, Volume I: The Trap
This article was originally published in 《2023开源操作系统训练营第二阶段总结报告-CatMe0w》.
It was probably during a chat with friends recently that someone mentioned this teaching kernel written in Rust. After getting interested, I looked it up and discovered the training camp’s existence.
I’ve been using Rust for a while now, but most of the time I’ve used it to write application-level programs, like web services. My understanding of operating systems is almost entirely theoretical, and I haven’t done systems development before. Learning there was such a rare opportunity, naturally I couldn’t miss it.
My article won’t focus on describing the development process, but will record more of the unique problems and experiences I encountered, as well as my personal reflections - I think my technical abilities fall short of many others, so if you’re looking for technical guides and tips, go check out someone else’s. 😋
Before Phase 2
Unfortunately, shortly after Phase 1 began, I encountered some trivial matters in life. I quickly discovered I didn’t have much time to invest in rCore. Fortunately, because I already had Rust experience, speed-running rustlings was just an evening’s work.
However, as expected, the subsequent Phase 2 wouldn’t be so easy.
ch3: The Curse from Apple
The training camp’s rCore (Guide) is actually a re-simplified version of rCore Tutorial, but even so, the first task ch3 still caught me completely off guard.
Previously, my understanding of operating systems was superficial - I roughly knew what things an operating system has, and the deepest understanding was just writing scheduling algorithm simulation demos in C for undergraduate coursework. For a real operating system, I didn’t even know where to start. For example, in the no_std world, where does stdout come from? How do you render logs on the screen? (Unfortunately, rCore doesn’t answer this question - SBI handles that part.)
Digesting so many new things at once isn’t easy, especially when you think you’ve read the Tutorial clearly enough, but upon opening the code repository you still feel lost and have to return to chewing through the Tutorial again. In this state, I struggled for almost a week to figure out how ch3 should actually be completed - or rather, where exactly the code should be written.
However, after completing the functionality, my code produced an error that was difficult to understand:
Panicked at src/bin/ch3_sleep.rs:16, assertion failed: current_time > 0
Many people encountered precision issues with get_time_ms and get_time_us, but I was quite certain my problem didn’t fall into this category - no matter how fast my computer was, get_time absolutely couldn’t be 0.
After repeatedly rewriting several different implementations, I started to notice an alarming fact: whether the “get_time is 0” error occurred was not uniform over time.
Sometimes, after casually changing a line or two of code, the first compilation could pass the test, but the second time onward would encounter this error.
While exploring late into the night, I discovered that adding a println in get_time could reduce the probability of this error occurring to zero.
Commit, push, CI… Run failed.
Well, let’s try deleting this println. After deletion, it couldn’t pass tests on my machine anymore, but out of curiosity, I still pushed this commit to remote.
This time, the test passed.
This was an error that only appeared on my machine.
I was using QEMU 8, and of course I had also updated to the latest pre-release RustSBI, but this… I indeed didn’t see anyone else encountering this kind of problem, and couldn’t understand it.
Then only one possibility remained… Apple Silicon.
ch4: Hard Mode
Although I had already learned my lesson in ch3, I still didn’t switch to a Linux virtual machine immediately. I thought, time was already running short, I’d switch to a VM after the same problem, or an even weirder problem appeared. I still wanted to save those two seconds of compilation time.
Fortunately, that curse from Apple never happened again.
However, ch4’s difficulty rose sharply, especially for me - I knew nothing about page tables. The pressure I felt in ch4, and that sense of being at a loss when facing the code repository, was much stronger than ch3.
At this point I was already thinking about retreating. Both the difficulty of the knowledge itself and the task difficulty in ch4 were a bit overwhelming for me.
However, I eventually figured out how to implement the two functions in ch4, as well as the other functions that needed to be added for them, although it took far longer than expected.
ch5: The Belated Period
After completing ch4, only two days remained until Phase 2’s original deadline. At that time, I didn’t know the passing requirement had been relaxed to ch5. After spending a day chewing through the ch5 documentation over the weekend, I still decided to leave.
Unfortunately, I couldn’t continue walking with you all.
…Right?
A week after the original deadline, I discovered there were still people pushing through Phase 2.
Then, let’s put a proper period on this.
ch5’s difficulty decreased considerably, probably also because I had become more familiar with the code repository. The place where I spent the most time on ch5 was actually cherry-picking previous code, spending quite a bit of time manually handling merge conflicts until it could compile and pass BASE=1. As for implementing the two functions, that was relatively easy and pleasant - after all, I now knew where the code should go.
The Ladder
Looking back, the code written in this repository really isn’t much. But for someone who had never understood the essence of operating systems, it opened the door to a new world. At this moment, operating systems are no longer mysterious, and rCore Tutorial is likewise a textbook worth reading carefully and repeatedly.
Finally, I’m also grateful for the resources and help provided by the teachers.