I hate writing
I’ve been working on a couple different assemblers for Ruby. Fisk is a pure Ruby x86 assembler. You can use it to generate bytes that can be executed on x86 machines. AArch64 is a pure Ruby ARM64 assembler. You can use it to generate bytes that can be executed on ARM64 machines.
Both of these libraries just generate bytes that can be interpreted by their respective processors. Unfortunately you can’t just generate bytes and expect the CPU to execute them. You first need to put the bytes in executable memory before you can hand them off to the CPU for execution. Executable memory is basically the same thing regardless of CPU architecture, so I decided to make a library called JITBuffer that encapsulates executable memory manipulation.
To use the
JITBuffer, you write platform specific bytes to the buffer, then give the buffer to the CPU for execution.
Here is an example on the ARM64 platform:
require "aarch64" require "jit_buffer" require "fiddle" asm = AArch64::Assembler.new # Make some instructions. These instructions simply # return the value 0xF00DCAFE asm.pretty do asm.movz x0, 0xCAFE asm.movk x0, 0xF00D, lsl(16) asm.ret end # Write the bytes to executable memory: buf = JITBuffer.new(4096) buf.writeable! asm.write_to buf buf.executable! # Point the CPU at the executable memory func = Fiddle::Function.new(buf.to_i, , -Fiddle::TYPE_INT) p func.call.to_s(16) # => "f00dcafe"
The example uses the
AArch64 gem to assemble ARM64 specific bytes, the
JITBuffer gem to allocate executable memory, and the
Fiddle gem to point the CPU at the executable memory and run it.
Tests are important I guess, so I thought it would be a good idea to write tests for the
My goal for the test is to ensure that it’s actually possible to execute the bytes in the buffer itself.
I’m not a huge fan of stubs or mocks and I try to avoid them if possible, so I wanted to write a test that would actually execute the bytes in the buffer.
I also want the test to be “cross platform” (where “cross platform” means “works on x86_64 and ARM64”).
Writing a test like this would mean writing something like the following:
def test_can_execute buf = JITBuffer.new(4096) platform = figure_out_what_platform_we_are_on() if platform == "arm64" # write arm64 specific bytes buf.write(...) else # write x86_64 specific bytes buf.write(...) end # Use fiddle to execute end
As I said at the start though, I hate writing
if statements, and I’d rather avoid it if possible.
In addition, how do you reliably figure out what platform you’re executing on?
I really don’t want to figure that out.
Not to mention, I just don’t think this code is cool.
My test requirements:
- No if statements
- Self contained (I don’t want to shell out or use other libraries)
- Must have pizzazz
Since machine code is just bytes that the CPU interprets, it made me wonder “is there a set of bytes that execute both on an x86_64 CPU and an ARM64 CPU?” It turns out there are, and I want to walk through them here.
First lets look at the x86_64 instructions we’ll execute. Below is the assembly code (in Intel syntax):
.start: mov rax, 0x2b ; put 0x2b in the rax register ret ; return from the function jmp start ; jump to .start
This assembly code puts the value
0x2b in the
rax register and returns from the current “C” function.
I put “C” in quotes because we’re writing assembly code, but the assembly code is conforming to the C calling convention and we’ll treat it as if it’s a C function when we call it.
The x86 C calling convention states that the value in the
rax register is the “return value” of the C function.
So we’ve created a function that returns
At the end of the code there is a
jmp instruction that jumps to the start of this sequence.
However, since we return from the function before getting to the jump, the jump is never used (or is it?????)
Machine code is just bytes, and here are the bytes for the above x86 machine code:
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b 0xC3 ; ret 0xEB 0xF6 ; jmp start
x86 uses a “variable width” encoding, meaning that the number of bytes each instruction uses can vary.
In this example, the
mov instruction used 7 bytes, and the
ret instruction used 1 byte.
This means that the
jmp instruction is the 9th byte, or offset 8.
Below are some ARM64 instructions we can execute:
movz X11, 0x7b7 ; put 0x7b7 in the X11 register movz X0, 0x2b ; put 0x2b in the X0 register ret ; return from the function
This machine code puts the value 0x7b7 in to the register
Then it puts the value 0x2b in the
The third instruction returns from the function.
Again we are abiding by the C calling convention, but this time on the ARM64 platform.
On the ARM64 platform, the value stored in
X0 is the return value.
So the above machine code will return the value
0x2b to the caller just like the x86_64 machine code did.
Here are the bytes that represent the above ARM64 machine code:
0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7 0x60 0x05 0x80 0xD2 ; movz X0, 0x2b 0xC0 0x03 0x5F 0xD6 ; ret
ARM64 uses fixed width instructions. All instructions on ARM64 are 32 bits wide.
Cross Platform Machine Code
Lets look at the byte blocks next to each other:
; x86_64 bytes 0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b 0xC3 ; ret 0xEB 0xF6 ; jmp start
; ARM64 bytes 0xEB 0xF6 0x80 0xD2 ; movz X11, 0x7b7 0x60 0x05 0x80 0xD2 ; movz X0, 0x2b 0xC0 0x03 0x5F 0xD6 ; ret
Looking at the bytes, you’ll notice that the first two bytes of the ARM64 code (
0xEB 0xF6) are exactly the same as the last two bytes of the x86_64 code.
movz instruction in the ARM64 code was specially crafted as to have the same bytes as the last
jmp instruction in the x86 code.
If we combine these bytes, then tell the CPU to execute starting at a particular offset, then the interpretation of the bytes will change depending on the CPU, but the result is the same.
Here are the bytes combined:
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00 ; mov rax, 0x2b 0xC3 ; ret start -> 0xEB 0xF6 0x80 0xD2 ; (jmp start, or movz X11, 0x7b7) 0x60 0x05 0x80 0xD2 ; movz X0, 0x2b 0xC0 0x03 0x5F 0xD6 ; ret
Regardless of platform, we’ll tell the CPU to start executing from offset 8 in the byte buffer.
If it’s an x86 CPU, it will interpret the bytes as a jump, execute the top bytes, return at the
ret, and ignore the rest of the bytes in the buffer (as they are never reached).
If it’s an ARM64 machine, then it will interpret the bytes as “put 0x7b7 in the X11 register” and continue, never seeing the x86 specific bytes at the start of the buffer.
Both x86_64 and ARM64 platforms will return the same value 0x2b.
Now we can write a test without
if statements like this:
def test_execute # Cross platform bytes bytes = [0x48, 0xc7, 0xc0, 0x2b, 0x00, 0x00, 0x00, # x86_64 mov rax, 0x2b 0xc3, # x86_64 ret 0xeb, 0xf6, # x86 jmp 0x80, 0xd2, # ARM movz X11, 0x7b7 0x60, 0x05, 0x80, 0xd2, # ARM movz X0, #0x2b 0xc0, 0x03, 0x5f, 0xd6] # ARM ret # Write them to the buffer jit = JITBuffer.new(4096) jit.writeable! jit.write bytes.pack("C*") jit.executable! # start at offset 8 offset = 8 func = Fiddle::Function.new(jit.to_i + offset, , Fiddle::TYPE_INT) # Check the return value assert_equal 0x2b, func.call end
Tons of pizzazz!
This test will execute machine code on both x86_64 as well as ARM64 and the machine code will return the same value. Not to mention, there’s no way RuboCop or Flay could possibly complain about this code. 🤣
I hope this inspires you to try writing cross platform machine code. This code only supports 2 platforms, but it does make me wonder how far we could stretch this and how many platforms we could support.
Anyway, hope you have a good day!