seccomp?
seccomp(2)
is a Linux system call that filters processes' syscall invocations. After Linux 3.5, they introduced "seccomp mode 2" that allow systems to filter syscalls by syscall numbers and their arguments.
This is also one of basic containers features, e.g. Docker uses this.
It uses BPF (Berkeley Packet Filter), which is also used in libpcap
, to filter syscalls as fast as possible.
mruby?
mruby is one of the Ruby implementations (FYI mruby is created by Matz, as MRI is founded by Matz). It is designed to be embeddable into gadgets and to be lightweight than CRuby.
As a side effect of embedding features, mruby has very clean and concise C API, so it is easy to write C bindings/systems programmings with it. It is similar to Lua language in many aspects.
I am going to try to use seccomp basic features via libseccomp, writing mruby's gem(mrbgem) to bind libseccomp.
Here is mruby-seccomp
. You can build a mruby
binary with seccomp access, by checking out this repo in some Linux and just hit make
(building mruby itself requires CRuby, bison and some libraries).
Basic usage
In C level, you can create seccomp context by seccomp_init(3)
, add filter rules by seccomp_rule_add(3)
, then load to current process by seccomp_load(3)
.
A seccomp context has a default action:
:kill => SCMP_ACT_KILL, :allow => SCMP_ACT_ALLOW, :trap => SCMP_ACT_TRAP, ...
Then you can add custom filter actions with syscall and arguments specifications.
Use :kill
by default to make a whitelist, and use :allow
a blacklist.
This is a blacklist that restricts uname(2)
calls. Build with mruby-uname.
context = Seccomp.new(default: :allow) do |rule|
rule.trap(:uname)
end
context.load
Uname.nodename # Really calls `uname(2)` !
$ ./mruby/bin/mruby /tmp/test.rb
Bad system call
Bad system call
implies SIGSYS
- you even can trap this.
Combination with fork()/exec()
Loaded seccomp's information will not be changed after fork/clone and execve
, as a kernel document says.
You can load a seccomp context just after fork()
and then do execve()
, to create a "sandbox" container - which restricts child processes to call specified syscalls.
# fork from https://github.com/iij/mruby-process
# exec from https://github.com/haconiwa/mruby-exec
context = Seccomp.new(default: :allow) do |rule|
rule.kill(:mkdir, Seccomp::ARG(:>=, 0), Seccomp::ARG(:>=, 0))
end
pid = Process.fork do
context.load
puts "==== It will be jailed. Please try to mkdir"
exec "/bin/sh"
end
p(Process.waitpid2 pid)
$ ./mruby/bin/mruby /tmp/jail.rb
==== It will be jailed. Please try to mkdir
sh-4.2$ mkdir /tmp/test1234
Bad system call
This is similar to Linux Capabilities, but seccomp has a finer granularity to control programs.
Advanced features
Using seccomp we can catch SIGSYS with the informations of what syscall is blocked(via struct siginfo_t
). mruby-seccomp supports this.
context = Seccomp.new(default: :allow) do |rule|
rule.trap(:uname)
end
Seccomp.on_trap do |syscall|
puts "Trapped: syscall #{Seccomp.syscall_to_name(syscall)} = ##{syscall}"
end
context.load
begin
# Then hit `uname(2)`
p "nodename: " + Uname.nodename
rescue => e
puts "Catch as error: " + e.message
puts "Trapping is OK"
end
$ ./mruby/bin/mruby /tmp/trap.rb
Trapped: syscall uname = #63
Catch as error: uname failed
Trapping is OK
NOTE: Signal handlers will be cleaned after exec()
. So exec()'ing to be bash and trapping uname(1)
hit is unsupported now.
Conclusion
seccomp
can restrict a process and a process tree's syscall invocations. We can do an experiment using mruby, mruby-seccomp and some mrbgems about processes control.
BTW seccomp is supported in Docker, LXC and Haconiwa.
Pull requests to mruby-seccomp is welcomed!
Original Japanese article
Original article written by me in Japanese:
Top comments (0)