Tenderlove Making

Cross Platform Machine Code

I hate writing if statements.

I’ve been working on a couple different assemblers for Ruby. Fisk is a pure Ruby x86 assembler. You can use it to generate bytes that can be executed on x86 machines. AArch64 is a pure Ruby ARM64 assembler. You can use it to generate bytes that can be executed on ARM64 machines.

Both of these libraries just generate bytes that can be interpreted by their respective processors. Unfortunately you can’t just generate bytes and expect the CPU to execute them. You first need to put the bytes in executable memory before you can hand them off to the CPU for execution. Executable memory is basically the same thing regardless of CPU architecture, so I decided to make a library called JITBuffer that encapsulates executable memory manipulation.

To use the JITBuffer, you write platform specific bytes to the buffer, then give the buffer to the CPU for execution. Here is an example on the ARM64 platform:

require "aarch64"
require "jit_buffer"
require "fiddle"

asm = AArch64::Assembler.new

# Make some instructions.  These instructions simply
# return the value 0xF00DCAFE
asm.pretty do
  asm.movz x0, 0xCAFE
  asm.movk x0, 0xF00D, lsl(16)
  asm.ret
end

# Write the bytes to executable memory:
buf = JITBuffer.new(4096)
buf.writeable!
asm.write_to buf
buf.executable!

# Point the CPU at the executable memory
func = Fiddle::Function.new(buf.to_i, [], -Fiddle::TYPE_INT)
p func.call.to_s(16) # => "f00dcafe"

The example uses the AArch64 gem to assemble ARM64 specific bytes, the JITBuffer gem to allocate executable memory, and the Fiddle gem to point the CPU at the executable memory and run it.

Tests are important I guess, so I thought it would be a good idea to write tests for the JITBuffer gem. My goal for the test is to ensure that it’s actually possible to execute the bytes in the buffer itself. I’m not a huge fan of stubs or mocks and I try to avoid them if possible, so I wanted to write a test that would actually execute the bytes in the buffer. I also want the test to be “cross platform” (where “cross platform” means “works on x86_64 and ARM64”).

Writing a test like this would mean writing something like the following:

def test_can_execute
  buf = JITBuffer.new(4096)

  platform = figure_out_what_platform_we_are_on()
  if platform == "arm64"
    # write arm64 specific bytes
    buf.write(...)
  else
    # write x86_64 specific bytes
    buf.write(...)
  end

  # Use fiddle to execute
end

As I said at the start though, I hate writing if statements, and I’d rather avoid it if possible. In addition, how do you reliably figure out what platform you’re executing on? I really don’t want to figure that out. Not to mention, I just don’t think this code is cool.

My test requirements:

  • No if statements
  • Self contained (I don’t want to shell out or use other libraries)
  • Must have pizzazz

Since machine code is just bytes that the CPU interprets, it made me wonder “is there a set of bytes that execute both on an x86_64 CPU and an ARM64 CPU?” It turns out there are, and I want to walk through them here.

x86_64 Instructions

First lets look at the x86_64 instructions we’ll execute. Below is the assembly code (in Intel syntax):

.start:
  mov rax, 0x2b ; put 0x2b in the rax register
  ret           ; return from the function
  jmp start     ; jump to .start

This assembly code puts the value 0x2b in the rax register and returns from the current “C” function. I put “C” in quotes because we’re writing assembly code, but the assembly code is conforming to the C calling convention and we’ll treat it as if it’s a C function when we call it. The x86 C calling convention states that the value in the rax register is the “return value” of the C function. So we’ve created a function that returns 0x2b. At the end of the code there is a jmp instruction that jumps to the start of this sequence. However, since we return from the function before getting to the jump, the jump is never used (or is it?????)

Machine code is just bytes, and here are the bytes for the above x86 machine code:

0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00  ; mov rax, 0x2b
0xC3                                ; ret
0xEB 0xF6                           ; jmp start

x86 uses a “variable width” encoding, meaning that the number of bytes each instruction uses can vary. In this example, the mov instruction used 7 bytes, and the ret instruction used 1 byte. This means that the jmp instruction is the 9th byte, or offset 8.

ARM64 Instructions

Below are some ARM64 instructions we can execute:

movz X11, 0x7b7 ; put 0x7b7 in the X11 register
movz X0, 0x2b   ; put 0x2b in the X0 register
ret             ; return from the function

This machine code puts the value 0x7b7 in to the register X11. Then it puts the value 0x2b in the X0 register. The third instruction returns from the function. Again we are abiding by the C calling convention, but this time on the ARM64 platform. On the ARM64 platform, the value stored in X0 is the return value. So the above machine code will return the value 0x2b to the caller just like the x86_64 machine code did.

Here are the bytes that represent the above ARM64 machine code:

0xEB 0xF6 0x80 0xD2  ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2  ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6  ; ret

ARM64 uses fixed width instructions. All instructions on ARM64 are 32 bits wide.

Cross Platform Machine Code

Lets look at the byte blocks next to each other:

; x86_64 bytes
0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00  ; mov rax, 0x2b
0xC3                                ; ret
0xEB 0xF6                           ; jmp start
; ARM64 bytes
0xEB 0xF6 0x80 0xD2  ; movz X11, 0x7b7
0x60 0x05 0x80 0xD2  ; movz X0, 0x2b
0xC0 0x03 0x5F 0xD6  ; ret

Looking at the bytes, you’ll notice that the first two bytes of the ARM64 code (0xEB 0xF6) are exactly the same as the last two bytes of the x86_64 code. The first movz instruction in the ARM64 code was specially crafted as to have the same bytes as the last jmp instruction in the x86 code.

If we combine these bytes, then tell the CPU to execute starting at a particular offset, then the interpretation of the bytes will change depending on the CPU, but the result is the same.

Here are the bytes combined:

          0x48 0xC7 0xC0 0x2b 0x00 0x00 0x00  ; mov rax, 0x2b
          0xC3                                ; ret
start ->  0xEB 0xF6 0x80 0xD2                 ; (jmp start, or movz X11, 0x7b7)
          0x60 0x05 0x80 0xD2                 ; movz X0, 0x2b
          0xC0 0x03 0x5F 0xD6                 ; ret

Regardless of platform, we’ll tell the CPU to start executing from offset 8 in the byte buffer. If it’s an x86 CPU, it will interpret the bytes as a jump, execute the top bytes, return at the ret, and ignore the rest of the bytes in the buffer (as they are never reached). If it’s an ARM64 machine, then it will interpret the bytes as “put 0x7b7 in the X11 register” and continue, never seeing the x86 specific bytes at the start of the buffer.

Both x86_64 and ARM64 platforms will return the same value 0x2b.

Now we can write a test without if statements like this:

def test_execute
  # Cross platform bytes
  bytes = [0x48, 0xc7, 0xc0, 0x2b, 0x00, 0x00, 0x00, # x86_64 mov rax, 0x2b
           0xc3,                                     # x86_64 ret
           0xeb, 0xf6,                               # x86 jmp
           0x80, 0xd2,                               # ARM movz X11, 0x7b7
           0x60, 0x05, 0x80, 0xd2,                   # ARM movz X0, #0x2b
           0xc0, 0x03, 0x5f, 0xd6]                   # ARM ret

  # Write them to the buffer
  jit = JITBuffer.new(4096)
  jit.writeable!
  jit.write bytes.pack("C*")
  jit.executable!

  # start at offset 8
  offset = 8
  func = Fiddle::Function.new(jit.to_i + offset, [], Fiddle::TYPE_INT)

  # Check the return value
  assert_equal 0x2b, func.call
end

So simple!

So cool!

Tons of pizzazz!

This test will execute machine code on both x86_64 as well as ARM64 and the machine code will return the same value. Not to mention, there’s no way RuboCop or Flay could possibly complain about this code. 🤣

I hope this inspires you to try writing cross platform machine code. This code only supports 2 platforms, but it does make me wonder how far we could stretch this and how many platforms we could support.

Anyway, hope you have a good day!

« go back