G33K @ Work | Geeky stuff



Jan/11

31

Solving Other People’s Problems

I needed a virtualized Mac OS. I want to know how a special EFI extension works which permanently changes the harddisk. I don’t want to do this on my real machine. I would have to reboot it several times and perhaps loose all my data on my HDD if something goes wrong while I’m experimenting which this extension. Also live debugging would be next to impossible.

Luckily, VMware and VirtualBox are able to virtualize Apple’s Mac OS X Server. His holy Steveness decided to allow virtualization only for the Server version of Mac OS. But there is help, so I „fixed“ the DVD to be able to boot a Mac OS X 10.6.3 retail DVD in VMware.

Unfortunately VMware then told me that the CPU was halted by the guest operating system. This happens if a kernel issues a „hlt“ statement on all currently active cores without interrupts enabled to wake up the CPU later. This is bad because the whole operating system is trapped in this state. But why did this happen with OS X?

The most usual causes for this are a) a programming error or b) a panic issued by some code in the kernel. A programming error can be excluded here, so the kernel must panic in a very early state. I googled the error message I got („The CPU has been disabled by the guest operating system.“) and found this. Great. So the kernel seems to have problems with newer CPUs and I do have an i5 in my machine.

First I tried to circumvent this issue by several configuration changes in the VM configuration file and using different EFI Loaders but all those attempts failed, because it wasn’t some issue with the loaders. It was the kernel itself which panicked.

So I googled for a way to attach a debugger to the VM and I found one. VMware itself has a GDB Stub (see „GDB Server“). I configured the VM, ran it, attached the GDB to the VM, continued to execution and let the kernel panic. After that I hit CTRL+C in the GDB and the VM is halted in a state where I can modify it with the GDB completely:

andy@geekbook ~/Desktop % gdb mach_kernel
GNU gdb 6.3.50-20050815 (Apple version gdb-1472) (Wed Jul 21 10:53:12 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...

> target remote localhost:8864
[New thread 1]
0x000000003fc40786 in ?? ()

> c
^C
Program received signal SIGINT, Interrupt.
0xffffff800022ddcf in Debugger ()

>

But what to do now? Well, we want to find out where the kernel panics and do something against it.
The VM already has panicked, so we can have a look at a stacktrace to determine which function caused the panic:

> bt
#0  0xffffff800022ddcf in Debugger ()
#1  0xffffff8000204ac3 in panic ()
#2  0xffffff80002cef74 in kernel_trap ()
#3  0xffffff80002e0e9a in return_from_trap ()
#4  0x0000000000002206 in ?? ()
#5  0xffffff80002d6bb1 in cnputc ()
#6  0xffffff800027cf30 in consdebug_putc ()
#7  0xffffff8000206577 in __doprnt ()
#8  0xffffff80002073b8 in kdb_printf ()
#9  0xffffff8000204b2b in panic ()
#10 0xffffff8000228834 in cpuid_set_info ()
#11 0xffffff80002c1c00 in cpuid_extfeatures ()
#12 0xffffff80002c4178 in vstart ()

What can we see here? This is the chain of functions the kernel called until it got into the state where it is now. Especially #9 is interesting. This is the panic() function which apparently lets the kernel panic, hence the name. It is called by #10 at address 0xffffff8000228834 which is the function cpuid_set_info.

This makes totally sense. It seems that the kernel tries to detect the CPU very early. Intel (and compatible) CPUs have a special assembler instruction to retrieve information about the CPU the code is running on. Now guess its name: „cpuid“. It seems that the kernel pedantically checks which CPU it is running on and if it is unknown, it panics.

Let’s fix this behaviour. What do we need for this? We need a good disassembler and our 1337 x86 Assembler skills.

We already got the first one. It’s the GDB:

> disassemble cpuid_set_info
Dump of assembler code for function cpuid_set_info:
0xffffff800022829e <cpuid_set_info+0>:	push   rbp
0xffffff800022829f <cpuid_set_info+1>:	mov    rbp,rsp
0xffffff80002282a2 <cpuid_set_info+4>:	push   r15
0xffffff80002282a4 <cpuid_set_info+6>:	push   r14
0xffffff80002282a6 <cpuid_set_info+8>:	push   r13
0xffffff80002282a8 <cpuid_set_info+10>:	push   r12
0xffffff80002282aa <cpuid_set_info+12>:	push   rbx
0xffffff80002282ab <cpuid_set_info+13>:	sub    rsp,0xa8
0xffffff80002282b2 <cpuid_set_info+20>:	mov    esi,0x198
0xffffff80002282b7 <cpuid_set_info+25>:	lea    rdi,[rip+0x49e6c2]        # 0xffffff80006c6980
0xffffff80002282be <cpuid_set_info+32>:	call   0xffffff80002c17f0 <bzero>
0xffffff80002282c3 <cpuid_set_info+37>:	xor    eax,eax
0xffffff80002282c5 <cpuid_set_info+39>:	cpuid
0xffffff80002282c7 <cpuid_set_info+41>:	mov    DWORD PTR [rbp-0x40],eax
.......
0xffffff8000228ba1 <cpuid_set_info+2307>:	leave
0xffffff8000228ba2 <cpuid_set_info+2308>:	ret
0xffffff8000228ba3 <cpuid_set_info+2309>:	mov    eax,0x6b5a4cd2
0xffffff8000228ba8 <cpuid_set_info+2314>:	mov    DWORD PTR [rip+0x49df4e],eax        # 0xffffff80006c6afc
0xffffff8000228bae <cpuid_set_info+2320>:	jmp    0xffffff8000228834 <cpuid_set_info+1430>
End of assembler dump.

>

I shortened the dump, because it is too long and I will pick out the interesting things by hand and post them separately.

Puh, that’s much code. Where do we start? We have the address of the call to the panic function. The one from the stacktrace. Let’s have a look there:

0xffffff80002287d5 <cpuid_set_info+1335>:	add    BYTE PTR [rax],al
0xffffff80002287d7 <cpuid_set_info+1337>:	add    BYTE PTR [rax-0x50000000],dh
0xffffff80002287dd <cpuid_set_info+1343>:	add    BYTE PTR [rax],al
0xffffff80002287df <cpuid_set_info+1345>:	add    BYTE PTR [rax-0x50000000],dh
0xffffff80002287e5 <cpuid_set_info+1351>:	add    BYTE PTR [rax],al
0xffffff80002287e7 <cpuid_set_info+1353>:	add    BYTE PTR [rax-0x50000000],dh
0xffffff80002287ed <cpuid_set_info+1359>:	add    BYTE PTR [rax],al
0xffffff80002287ef <cpuid_set_info+1361>:	add    BYTE PTR [rdi],dh
0xffffff80002287f1 <cpuid_set_info+1363>:	add    al,0x0
0xffffff80002287f3 <cpuid_set_info+1365>:	add    BYTE PTR [rax-0x55ccc6d5],bh
0xffffff80002287f9 <cpuid_set_info+1371>:	jmp    0xffffff8000228ba8 <cpuid_set_info+2314>
0xffffff80002287fe <cpuid_set_info+1376>:	mov    eax,0x73d67300
0xffffff8000228803 <cpuid_set_info+1381>:	jmp    0xffffff8000228ba8 <cpuid_set_info+2314>
0xffffff8000228808 <cpuid_set_info+1386>:	mov    eax,0x426f69ef
0xffffff800022880d <cpuid_set_info+1391>:	jmp    0xffffff8000228ba8 <cpuid_set_info+2314>
0xffffff8000228812 <cpuid_set_info+1396>:	mov    eax,0x78ea4fbc
0xffffff8000228817 <cpuid_set_info+1401>:	jmp    0xffffff8000228ba8 <cpuid_set_info+2314>
0xffffff800022881c <cpuid_set_info+1406>:	mov    DWORD PTR [rip+0x49e2d6],0x0        # 0xffffff80006c6afc
0xffffff8000228826 <cpuid_set_info+1416>:	lea    rdi,[rip+0x348073]        # 0xffffff80005708a0
0xffffff800022882d <cpuid_set_info+1423>:	xor    eax,eax
0xffffff800022882f <cpuid_set_info+1425>:	call   0xffffff8000204939 <panic>

The last instruction is indeed a call to „panic“. But why is it called? There is no conditional branch instruction which decides whether to panic or not, but some weird jumps and obove them obviously incorrect code. Not necessarily incorrect, but semantically these instructions don’t make sense. We’ll leave them for now. Let’s see where other code could jump to the code that actually calls „panic“. A few instructions above, we find this:

0xffffff800022871f <cpuid_set_info+1153>:	lea    rsi,[rip+0x49e25a]        # 0xffffff80006c6980
0xffffff8000228726 <cpuid_set_info+1160>:	lea    rdi,[rip+0x348163]        # 0xffffff8000570890
0xffffff800022872d <cpuid_set_info+1167>:	call   0xffffff8000224574 <strncmp>
0xffffff8000228732 <cpuid_set_info+1172>:	test   eax,eax
0xffffff8000228734 <cpuid_set_info+1174>:	jne    0xffffff8000228826 <cpuid_set_info+1416>
0xffffff800022873a <cpuid_set_info+1180>:	cmp    BYTE PTR [rip+0x49e28b],0x6        # 0xffffff80006c69cc
0xffffff8000228741 <cpuid_set_info+1187>:	jne    0xffffff800022881c <cpuid_set_info+1406>
0xffffff8000228747 <cpuid_set_info+1193>:	movzx  eax,BYTE PTR [rip+0x49e27f]        # 0xffffff80006c69cd
0xffffff800022874e <cpuid_set_info+1200>:	sub    eax,0xd
0xffffff8000228751 <cpuid_set_info+1203>:	cmp    al,0x21
0xffffff8000228753 <cpuid_set_info+1205>:	ja     0xffffff800022881c <cpuid_set_info+1406>
0xffffff8000228759 <cpuid_set_info+1211>:	movzx  eax,al
0xffffff800022875c <cpuid_set_info+1214>:	lea    rdx,[rip+0x9]        # 0xffffff800022876c <cpuid_set_info+1230>
0xffffff8000228763 <cpuid_set_info+1221>:	movsxd rax,DWORD PTR [rdx+rax*4]
0xffffff8000228767 <cpuid_set_info+1225>:	add    rax,rdx
0xffffff800022876a <cpuid_set_info+1228>:	jmp    rax

What does this code do? At the beginning it seems to load 2 values relative to the instruction pointer into registers rsi and rdi. Then it calls „strncmp“. strncmp usually takes two pointers to strings. So let’s have a look where these pointers actually point to with the GDB:

> print (char*)0xffffff80006c6980
$1 = 0xffffff80006c6980 "GenuineIntel"

> print (char*)0xffffff8000570890
$2 = 0xffffff8000570890 "GenuineIntel"

Both strings are the same. One string seems to be a constant and the other one is retrieved by a „cpuid“ instruction. We could look further into this but it’s not interesting to us.
These strings are the same. strncmp will return 0 and the „jne“ instruction after the „test“ will not branch. This is not the place where we are panicking.
The next instruction then loads some value into the eax register and then some other value is subtracted and compared with 0x21. The following „ja“ instruction (jump if above/greater than for unsigned values) jumps to „panic“. Perhaps this is where we panic? But I’m too lazy to read all this assembly and figure out what those values actually mean.
The solution is easy: I restarted the VM, set a breakpoint to the beginning of the code and single stepped until I hit the jump to the panic-calling code. It wasn’t the first compare and it also wasn’t the second one. It was this last „jmp rax“.

Huh? What’s happening here? Actually it’s quite easy. Let’s do it step by step.

The „lea    rdx,[rip+0x9]“ loads the address of the current instruction pointer + 9 into register rdx. GDB is nice to us and shows the resulting address as a comment next to the instruction: 0xffffff800022876c.
The following „movsxd rax,DWORD PTR [rdx+rax*4]“ instruction looks weird. It loads a 32 bit value (DWORD) at address rdx+rax*4 into the register rax. The address in rax seems to be some kind of a table. A value in rax then is the offset inside this table. The *4 is done because one entry inside the table is 4 bytes long and the x86 addresses are bytewise.
After that rdx itself is added to rax and then a jmp occurs. And this jump really landed in the panic code. But why? And what’s this table address that is loaded into rdx at the beginning?

Remember the code I mentioned above which didn’t make sense? That’s where the address points to. This is just no code. It’s a table full of offsets. It looks like this if the don’t try to disassemble it but display these values as 4 byte unsigned values:

dd 88h
dd 92h
dd 9Ch
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0A6h
dd 0B0h
dd 0B0h
dd 437h
dd 0B0h
dd 0B0h
dd 0B0h
dd 437h
dd 437h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 0B0h
dd 437h

What value of this table was actually used? I again reset the VM, set a proper breakpoint and looked at the registers (GDB command „info reg“) while the address to jump to was calculated. It turned out that on my Core i5 machine offset 0x18 was used, which is a value of 0x0B0.

Adding this value to the beginning of the table (0xffffff800022876c + 0xB0 = 0xffffff800022881C) leads us directly to the code which calls panic. Cool. At least we now know what exactly happens. How do we fix this?

We just patch this table. We need to find the right value for our CPU or perhaps jump to a completely own stub which does some initialization work. This is what in the 10.6.6 kernel happens. The offsets are different because the linker put the symbols on different addresses but they also added one completely different and new value to the table for all the new CPUs.

I just tried value 0x437 and it works like a charm. I booted the vm again, set a breakpoint at „cpuid_set_info“ and patched the table with a simple „set *0xFFFFFF80002287CC=0x00000437“ and continued the execution. The kernel boots, I installed the machine and now I need to do the same shit again to get the kernel on the HDD to boot. After that I can update the OS to 10.6.6 what should solve the issue.

But why the fuck did I do all this? The problem is that I needed the VM now. And not when I got a proper CD which runs on i5/i7 processors. I also didn’t have another Mac here to setup and update the VM.

It also seems that all retail DVDs are an older build which does not run on i5 and i7 CPUs. Well done, Apple… Hence the name of this article: „Fixing Other People’s Problems“. It should just not be my problem to patch a kernel to install an operating system in a virtualized environment. And all this, just because Apple decided to just „support“ a hand full of CPUs.

Update: I just had a look at the XNU source code and found the c code I patched. Now it should be obvious why there was a branch table.

void
cpuid_set_info(void)
{
	i386_cpu_info_t		*info_p = &cpuid_cpu_info;

	bzero((void *)info_p, sizeof(cpuid_cpu_info));

	cpuid_set_generic_info(info_p);

	/* verify we are running on a supported CPU */
	if ((strncmp(CPUID_VID_INTEL, info_p->cpuid_vendor,
		     min(strlen(CPUID_STRING_UNKNOWN) + 1,
			 sizeof(info_p->cpuid_vendor)))) ||
	   (cpuid_set_cpufamily(info_p) == CPUFAMILY_UNKNOWN))
		panic("Unsupported CPU");

	info_p->cpuid_cpu_type = CPU_TYPE_X86;
	info_p->cpuid_cpu_subtype = CPU_SUBTYPE_X86_ARCH1;

	cpuid_set_cache_info(&cpuid_cpu_info);

	/*
	 * Find the number of enabled cores and threads
	 * (which determines whether SMT/Hyperthreading is active).
	 */
	switch (info_p->cpuid_cpufamily) {
	/*
	 * This should be the same as Nehalem but an A0 silicon bug returns
	 * invalid data in the top 12 bits. Hence, we use only bits [19..16]
	 * rather than [31..16] for core count - which actually can't exceed 8.
	 */
	case CPUFAMILY_INTEL_WESTMERE: {
		uint64_t msr = rdmsr64(MSR_CORE_THREAD_COUNT);
		info_p->core_count   = bitfield32((uint32_t)msr, 19, 16);
		info_p->thread_count = bitfield32((uint32_t)msr, 15,  0);
		break;
		}
	case CPUFAMILY_INTEL_NEHALEM: {
		uint64_t msr = rdmsr64(MSR_CORE_THREAD_COUNT);
		info_p->core_count   = bitfield32((uint32_t)msr, 31, 16);
		info_p->thread_count = bitfield32((uint32_t)msr, 15,  0);
		break;
		}
	}
	if (info_p->core_count == 0) {
		info_p->core_count   = info_p->cpuid_cores_per_package;
		info_p->thread_count = info_p->cpuid_logical_per_package;
	}

	cpuid_cpu_info.cpuid_model_string = ""; /* deprecated */
}

static uint32_t
cpuid_set_cpufamily(i386_cpu_info_t *info_p)
{
	uint32_t cpufamily = CPUFAMILY_UNKNOWN;

	switch (info_p->cpuid_family) {
	case 6:
		switch (info_p->cpuid_model) {
		case 13:
			cpufamily = CPUFAMILY_INTEL_6_13;
			break;
		case 14:
			cpufamily = CPUFAMILY_INTEL_YONAH;
			break;
		case 15:
			cpufamily = CPUFAMILY_INTEL_MEROM;
			break;
		case 23:
			cpufamily = CPUFAMILY_INTEL_PENRYN;
			break;
		case CPUID_MODEL_NEHALEM:
		case CPUID_MODEL_FIELDS:
		case CPUID_MODEL_DALES:
		case CPUID_MODEL_NEHALEM_EX:
			cpufamily = CPUFAMILY_INTEL_NEHALEM;
			break;
		case CPUID_MODEL_DALES_32NM:
		case CPUID_MODEL_WESTMERE:
		case CPUID_MODEL_WESTMERE_EX:
			cpufamily = CPUFAMILY_INTEL_WESTMERE;
			break;
		}
		break;
	}

	info_p->cpuid_cpufamily = cpufamily;
	return cpufamily;
}

At the beginning of „cpuid_set_info“ we can see the strncmp of the vendor name followed by a call to „cpuid_set_cpufamily“ where then the type of CPU is determined. The switch/case is the reason for the branch table.

RSS Feed

No comments yet.

Leave a comment!

<<

>>