YAML f7u12
Feb 6, 2013 @ 11:22 amYAML seems to be getting a bad rap lately, and I’m not surprised. YAML was used as the attack vector to execute arbitrary code in a Rails process and was even used to steal secrets from rubygems.org.
Let’s try to dissect the attack vector used, and see how YAML fits in to the picture.
The Metasploit Exploit
First lets cover the most widely known vector. We (the Rails Security Team) have had reports of attempts to use the exploit on several websites, it’s in metasploit, and a variant was used to attack rubygems.org.
The Troubled Code
I’m going to boil down the code involved in order to make the attack more easy to digest. In Rails, there is a class defined that basically boils down to this definition:
class Helpers
def initialize
@module = Module.new
end
def []=(key, value)
@module.module_eval <<-END_EVAL
def #{value}(*args)
# ... other stuff
end
END_EVAL
end
end
This class defines routing helper methods on a module, and later this module is mixed in to your views. Let’s take a look at how to use this code to teach Linux Zealot an important lesson in security.
Exploitation
Our attacker knows that this class is defined in the system. Using YAML, along with Psych’s object deserialization, they can inject any object in to the system they choose. So how can they use this object? Let’s take a look at the YAML payload for exploiting this code, then talk about how it works:
--- !ruby/hash:Helpers
foo: |-
mname; end; puts 'hello!'; def oops
We can clearly see the Ruby code in this YAML, but how does it get executed?
When Psych looks at the type declared !ruby/hash:Helpers
, it says “ah, this
is a subclass of a Ruby hash with the class of Helpers
”. So, it allocates a
new Helpers
class, then calls the []=
method for each of the key value
pairs in the YAML.
In this case, the key value pair is:
['foo', "mname; end; puts 'hello!'; def oops"]
Let’s take the value passed in, and do string substitution in the module_eval
part of the code:
def mname; end; puts 'hello!'; def oops(*args)
# ... other stuff
end
It’s kind of hard for Humans to read, so let’s add some newlines:
def mname
end
puts 'hello!'
def oops(*args)
# ... other stuff
end
And now it should be pretty clear how an attacker can execute arbitrary code.
How do we fix this?
We have a few options for fixing this.
- Replace our
module_eval
with adefine_method
- Change Psych to check super classes and ensure it’s a hash
- Stop using normal Ruby methods to set hash key / value pairs
Let’s say we did all of these. Are we safe? No.
Proxy Exploits
This exploit was reported to the Rails Security team by Ben Murphy. It uses
“proxy objects” (basically anything with a method_missing
implementation that calls to send
) to
execute arbitrary code on the remote server.
The Troubled Code
Again, this example is boiled down to make it easier to understand. Let’s say our system has a proxy object like this:
class Proxy
def initialize(server, arg1, meth)
@server = server
@arg1 = arg1
@meth = meth
end
def method_missing(mid, *args)
@server.send(@meth, @arg1, *args)
end
end
This proxy is used to forward messages to some other object (in this case
@server
).
Exploitation
Our attacker knows that this class exists in our system. However, merely instantiating this object isn’t enough. The application code must call a method (any method) on this object before the system is compromised. In the case of Rails, the attacker knows that the YAML objects will be embedded in the parameters hash, so things like:
params[:foo].nil?
params[:foo].length
### etc
will set this one off. So what does the YAML payload look like?
--- !ruby/object:Proxy
server: !ruby/module 'Kernel'
meth: eval
arg1: puts :omg
This particular form is a normal object load. However, the attacker sets the
instance variables on the proxy object to convenient values. When the proxy is
instantiated, @server
will be Kernel
, @meth
will be "eval"
, and @arg1
will be "puts :omg"
.
When the application calls any method on the object, method_missing
will
trigger. If we expand out the instance variables in method_missing
, it will
look like this:
Kernel.send("eval", "puts :omg")
Again we can see that arbitrary code can be executed.
This exploit is arguably more insidious than the previous exploit. In the
previous exploit it was obvious that the app code that contained a call to
eval
could possibly execute arbitrary code. This exploit was able to trick
our system in to calling eval
even though the app code never explicitly
eval’d anything!
You might be thinking “this proxy can’t be common”. Unfortunately there are classes similar to this in XMLRPC as well as Rack via a combination of ERB templates.
How do we fix this?
We applied our previous fixes, but our code was still susceptible to attacks. What do we do now? Why don’t we change Psych to:
- Only accept whitelisted classes
- But still allow Ruby “primitives” by default
Are we safe now? I guess that depends on what you mean by “safe”. Assuming that no object ever evals anything that’s set on your whitelisted classes, you may be safe from arbitrary code execution. But are you safe from other attacks?
MOAR ATTACKS!
Here’s a grab bag of attacks you can use even if “primitives” are allowed. Let your friends know how much you really love them with these tricks!
Eating up Object Space
In Ruby, symbols are never garbage collected. To DoS a server, simply send many Ruby symbols in YAML format:
---
- :foo
- :bar
- :baz
# etc
Infinite Loops!
Know a place where someone is doing this?
class Foo
def some_method
@ivar.each do |x|
# ...
end
end
end
Teach them about infinite ranges! Psych will deserialize this YAML to a range from 1 to infinity on the object.
--- !ruby/object:Foo
ivar: !ruby/range
begin: 1
end: .inf
excl: false
The @ivar.each
call will have a fun time looping forever!
Infinite Recursion!
Have a friend that recurses data structures like this:
class Foo
def some_method
stack = [@foo]
until stack.empty?
y = stack.pop
if Array === y
stack.push y
else
process y
end
end
end
end
Send them this little present:
--- !ruby/object:Foo
foo: &70143145831360
- *70143145831360
They’ll be so excited! @foo
will result in an array that contains itself.
The loop will be processing this stack for quite some time! Good times!
Pathological Regular Expressions
Your friend has given up on iterations. Just totally stopped. How can you show them how much you love them? I know! Send a pathological regular expression!
You know your friend is doing string matches:
class Foo
def initialize
@match = /ok/
end
def ok?
@match =~ 'aaaaaaaaaaaaaaaaaaaaaaaaadaaaac'
end
end
So you send a payload like this:
--- !ruby/object:Foo
match: !ruby/regexp /(b|a+)*c/
Loading this will result in a regular expression that takes an extremely long time to match the string. We can show our love by easily making every web process work hard on a bad regular expression!
ENOUGH ENOUGH!
So now we know that custom classes could result in evaled code, so we have to whitelist only “safe” classes. Now that we’ve whitelisted our safe classes, we need to make sure we don’t load symbols. After we make sure symbols are disallowed, we have to ensure loaded ranges are bounded. After we ensure loaded ranges are bounded, we have to check for self referential hashes and arrays, and the list goes on.
We’ve adjusted our code to make all these checks and balances. But we’ve only examined a handful of the Ruby “primitives” that are available. After examining only these few cases, are we sure that loading any other Ruby “primitive” should be considered safe?
I’m not.
YAML.safe_load
People are asking for a safe_load
from Psych. But the question is: “what does
safe mean?”. Some say “only prevent foreign code from being executed”, but
does that mean we’re safe?
To me it doesn’t. To me, “safe” means something that is:
- Easy to understand.
- Conservative.
- Easy to extend.
I propose that the meaning of “safe load” would only load Null, Booleans, Numerics, Strings, Arrays, Hashes, and no self-referential data structures. This is easy to understand. You only need to know about 6 data types, not a laundry list of possible classes.
I’d prefer to stay conservative, not playing whack-a-mole when someone figures out how to exploit another class. Keeping the number of supprted data types low prevents playing whack-a-mole.
If you really need to load other types, just add the class to the whitelist
when calling safe_load
. It should really be “that easy”. You explicitly
know the types that will be loaded, so the possible values returned only grow
when you say so.
YAML Postmortem
This section isn’t actually a postmortem (YAML isn’t dead), it’s actually just a postscript section, so you can stop reading now. I just wanted to call it “postmortem” because I think it’s funny when people have postmortems about software.
People seem to be giving YAML a bad rap over these exploits. The truth is that
all of these exploits exist in any scheme that allows you to create arbitrary
objects in a target system. This is why nobody uses Marshal
to send objects
around. Think of YAML as a human readable Marshal.
Should we stop using YAML? No. But we probably shouldn’t use it to read foreign data. Can we make Psych safe? As I said earlier, it depends on what you think “safe” means. My opinion of “safe” puts YAML on the same field as JSON as far as “objects that can be transferred” is concerned.
Anyway, I think it’s important to see we have three things going on in these exploits. We have YAML the langauge, which defines schemes for arbitrary object serialization, Psych which honors those requests, and user land code which is subject to the exploits. YAML the language doesn’t say any of this code should be executed, and in fact Psych won’t eval random input. The problem being that certain YAML documents can be fed to Psych to create objects that interact with user code in unexpected ways.
The user land code is what gets exploited, YAML and Psych are merely a vehicle. But asking users to remove all cases of module_eval
or method_missing
+ send
and to require boundry checks, etc is completely unreasonable.
This is why we need a YAML.safe_load
.