Cross Platform Machine Code
Jun 12, 2022 @ 12:54 pmI hate writing if
statements.
I’ve been working on a couple different assemblers for Ruby. Fisk is a pure Ruby x86 assembler. You can use it to generate bytes that can be executed on x86 machines. AArch64 is a pure Ruby ARM64 assembler. You can use it to generate bytes that can be executed on ARM64 machines.
Both of these libraries just generate bytes that can be interpreted by their respective processors. Unfortunately you can’t just generate bytes and expect the CPU to execute them. You first need to put the bytes in executable memory before you can hand them off to the CPU for execution. Executable memory is basically the same thing regardless of CPU architecture, so I decided to make a library called JITBuffer that encapsulates executable memory manipulation.
To use the JITBuffer
, you write platform specific bytes to the buffer, then give the buffer to the CPU for execution.
Here is an example on the ARM64 platform:
require "aarch64"
require "jit_buffer"
require "fiddle"
asm = AArch64::Assembler.new
# Make some instructions. These instructions simply
# return the value 0xF00DCAFE
asm.pretty do
asm.movz x0, 0xCAFE
asm.movk x0, 0xF00D, lsl(16)
asm.ret
end
# Write the bytes to executable memory:
buf = JITBuffer.new(4096)
buf.writeable!
asm.write_to buf
buf.executable!
# Point the CPU at the executable memory
func = Fiddle::Function.new(buf.to_i, [], -Fiddle::TYPE_INT)
p func.call.to_s(16) # => "f00dcafe"
The example uses the AArch64
gem to assemble ARM64 specific bytes, the JITBuffer
gem to allocate executable memory, and the Fiddle
gem to point the CPU at the executable memory and run it.
Tests are important I guess, so I thought it would be a good idea to write tests for the JITBuffer
gem.
My goal for the test is to ensure that it’s actually possible to execute the bytes in the buffer itself.
I’m not a huge fan of stubs or mocks and I try to avoid them if possible, so I wanted to write a test that would actually execute the bytes in the buffer.
I also want the test to be “cross platform” (where “cross platform” means “works on x86_64 and ARM64”).
Writing a test like this would mean writing something like the following:
def test_can_execute
buf = JITBuffer.new(4096)
platform = figure_out_what_platform_we_are_on()
if platform == "arm64"
# write arm64 specific bytes
buf.write(...)
else
# write x86_64 specific bytes
buf.write(...)
end
# Use fiddle to execute
end
As I said at the start though, I hate writing if
statements, and I’d rather avoid it if possible.
In addition, how do you reliably figure out what platform you’re executing on?
I really don’t want to figure that out.
Not to mention, I just don’t think this code is cool.
My test requirements:
- No if statements
- Self contained (I don’t want to shell out or use other libraries)
- Must have pizzazz
Since machine code is just bytes that the CPU interprets, it made me wonder “is there a set of bytes that execute both on an x86_64 CPU and an ARM64 CPU?” It turns out there are, and I want to walk through them here.
x86_64 Instructions
First lets look at the x86_64 instructions we’ll execute. Below is the assembly code (in Intel syntax):
.start:
mov rax, 0x2b ; put 0x2b in the rax register
ret ; return from the function
jmp start ; jump to .start
This assembly code puts the value 0x2b
in the rax
register and returns from the current “C” function.
I put “C” in quotes because we’re writing assembly code, but the assembly code is conforming to the C calling convention and we’ll treat it as if it’s a C function when we call it.
The x86 C calling convention states that the value in the rax
register is the “return value” of the C function.
So we’ve created a function that returns 0x2b
.
At the end of the code there is a jmp
instruction that jumps to the start of this sequence.
However, since we return from the function before getting to the jump, the jump is never used (or is it?????)
Machine code is just bytes, and here are the bytes for the above x86 machine code:
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
0xEB 0xF6 ; jmp start
x86 uses a “variable width” encoding, meaning that the number of bytes each instruction uses can vary.
In this example, the mov
instruction used 7 bytes, and the ret
instruction used 1 byte.
This means that the jmp
instruction is the 9th byte, or offset 8.
ARM64 Instructions
Below are some ARM64 instructions we can execute:
movz X11, 0x7b7 ; put 0x7b7 in the X11 register
movz X0, 0x2b ; put 0x2b in the X0 register
ret ; return from the function
This machine code puts the value 0x7b7 in to the register X11
.
Then it puts the value 0x2b in the X0
register.
The third instruction returns from the function.
Again we are abiding by the C calling convention, but this time on the ARM64 platform.
On the ARM64 platform, the value stored in X0
is the return value.
So the above machine code will return the value 0x2b
to the caller just like the x86_64 machine code did.
Here are the bytes that represent the above ARM64 machine code:
0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
ARM64 uses fixed width instructions. All instructions on ARM64 are 32 bits wide.
Cross Platform Machine Code
Lets look at the byte blocks next to each other:
; x86_64 bytes
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
0xEB 0xF6 ; jmp start
; ARM64 bytes
0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
Looking at the bytes, you’ll notice that the first two bytes of the ARM64 code (0xEB 0xF6
) are exactly the same as the last two bytes of the x86_64 code.
The first movz
instruction in the ARM64 code was specially crafted as to have the same bytes as the last jmp
instruction in the x86 code.
If we combine these bytes, then tell the CPU to execute starting at a particular offset, then the interpretation of the bytes will change depending on the CPU, but the result is the same.
Here are the bytes combined:
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b
0xC3 ; ret
start -> 0xEB 0xF6 0x80 0xD2 ; (jmp start, or movz X11, 0x7b7)
0x60 0x05 0x80 0xD2 ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6 ; ret
Regardless of platform, we’ll tell the CPU to start executing from offset 8 in the byte buffer.
If it’s an x86 CPU, it will interpret the bytes as a jump, execute the top bytes, return at the ret
, and ignore the rest of the bytes in the buffer (as they are never reached).
If it’s an ARM64 machine, then it will interpret the bytes as “put 0x7b7 in the X11 register” and continue, never seeing the x86 specific bytes at the start of the buffer.
Both x86_64 and ARM64 platforms will return the same value 0x2b.
Now we can write a test without if
statements like this:
def test_execute
# Cross platform bytes
bytes = [0x48, 0xc7, 0xc0, 0x2b, 0x00, 0x00, 0x00, # x86_64 mov rax, 0x2b
0xc3, # x86_64 ret
0xeb, 0xf6, # x86 jmp
0x80, 0xd2, # ARM movz X11, 0x7b7
0x60, 0x05, 0x80, 0xd2, # ARM movz X0, #0x2b
0xc0, 0x03, 0x5f, 0xd6] # ARM ret
# Write them to the buffer
jit = JITBuffer.new(4096)
jit.writeable!
jit.write bytes.pack("C*")
jit.executable!
# start at offset 8
offset = 8
func = Fiddle::Function.new(jit.to_i + offset, [], Fiddle::TYPE_INT)
# Check the return value
assert_equal 0x2b, func.call
end
So simple!
So cool!
Tons of pizzazz!
This test will execute machine code on both x86_64 as well as ARM64 and the machine code will return the same value. Not to mention, there’s no way RuboCop or Flay could possibly complain about this code. 🤣
I hope this inspires you to try writing cross platform machine code. This code only supports 2 platforms, but it does make me wonder how far we could stretch this and how many platforms we could support.
Anyway, hope you have a good day!