Fixing a rare Intel Xeon Phi motherboard which forgot how to network

I have a four node Intel Xeon Phi server. I got this roughly four years ago during Covid after reading about Brandon Falk's vectorized emulation endeavors. Brandon also streamed a lot of his development work on YouTube and I thought to myself: "Hey, I can implement this too" and so I did together with a friend, but that's a topic for another day.

Below you can see the case with one of the nodes that can be plugged into the back. I have four Intel Xeon Phi 7230 nodes, two of them with 48GiB of memory for vectorized emulation experiments. The node's product designation is HNS7200AP which contain an S7200AP motherboard that supports the Xeon Phi x200 Knights Landing CPUs. There are also refreshed variants called HNS7200APR/S7200APR with a slightly modified motherboard that has a different CPU core voltage regulator for the Xeon Phi x205 Knights Mill CPUs. I have two of these as well, but that's also a topic of another blog post.

The four-node Xeon Phi chassis with one of the four nodes sitting on top

The whole chassis is sitting in a datacenter. This is mostly because at the time I got it, I was living in a rented apartment without a real cellar or other place to put an obnoxiously loud server. Every time we write software or play around we would connect to the IPMI running on the BMC and turn on the node, do our stuff and turn it off again, because power costs money, even in a datacenter. Only two of the nodes have actual memory DIMMs and an mSATA SSD to boot from installed. The other two are functional, but only with the 16GiB on-CPU-MCDRAM and by booting from the network. We very creatively call them phi1 to phi4 and we mostly used phi4 for our experiments.

If I remember correctly, we had a power outage once in the data center. Anyway, something happened and after that phi4's BMC wouldn't acquire a DHCP lease anymore and the managed switch it was connected to showed the link as down. We power cycled everything and the state remained. Weird. Maybe something on the motherboard blew up? I didn't have the time to drive by the DC and check it out, so we just used phi3 from this time on, because it was equally equipped as the now dead phi4.

Until earlier this week... I drove by, checked out the node and turned it on manually. It booted. But the BMC was still unreachable. I shut it down, pulled out the node a bit from the chassis and slid it back in, just for good measure. Same thing. And then I decided to pull it out, take it with me and just check it out at home.

Read more  ↩︎

Reverse engineering the Raspberry Pi RP1

When the Raspberry Pi 5 was announced, their new I/O-companion-chip called RP1 was also announced.

Until we were actually able to buy the hardware, more and more details of the RP1 were released. It contains two Cortex-M3 ARM cores, a PIO interface like we already know from the Raspberry Pi Pico and a bunch more things. The only thing released by the Raspberry Pi foundation that describes this new chip is a datasheet that doesn't deserve its name which explains a few peripherals. You can use this datasheet as a guideline if you want to build drivers that interface with devices on the RP1 in case you develop an operating system. Every detail about the firmware that is running on the chip, system level control registers for clocks, resets etc. are mostly undocumented and there is no way to load your own code into the dormand Cortex-M3 core on the RP1. That sucks.

After I got my hands on a Raspberry Pi 5, I started taking apart the early boot firmware and found out how it gets loaded by the first (or second?) stage bootloader and started reverse engineering a bit to load my own code into the RP1.

You can read more about this in the GitHub repo of this project.