Jump to content

Ruby Programming/Standard Library/DRb

From Wikibooks, open books for an open world

Distributed Ruby (DRb) allows inter-process communication between Ruby programs by implementing remote procedure calling.

Introduction

[edit | edit source]

Distributed Ruby enables remote method calling for Ruby. It is part of the standard library and therefore you can expect it to be installed on most systems using MRI Ruby. Because the underlying object serialization depends on Marshal, which is implemented in C, good speeds are expectable.

Let's start with a simple example, so the use of this module becomes clear.

Here's server.rb, where we create a single instance of an object (in this case a Hash) and share it on TCP port 9000.

# Load the DRb library
require 'drb'

# Create the front object
myhash = { counter: 0 }
def myhash.inc(elem)
  self[elem] += 1
end

# Start the service
DRb.start_service('druby://localhost:9000', myhash)

# Make the main thread wait for the DRb thread,
# otherwise the script execution would already end here.
DRb.thread.join

And here's client.rb:

require 'drb'

# We create a DRbObject instance that is connected to our server.
# All methods executed on this object will be executed to the remote one.
obj = DRbObject.new(nil, 'druby://localhost:9000')

puts obj[:counter]
obj.inc(:counter)
puts obj[:counter]

puts "Last access time = #{obj[:lastaccess]}"
obj[:lastaccess] = Time.now

Start the server in one shell session (or in the background), and in another session run the client a few times:

$ ruby client.rb
0
1
Last access time = 
$ ruby client.rb
1
2
Last access time = Fri Oct 22 22:23:59 BST 2004

The server and the client don't need to be run on the same machine. If you want the server to listen on all interfaces (and therefore also on remote connections) you need to change 'localhost' to '0.0.0.0' in server.rb. The client then needs to be configured to connect to the remote server by replacing 'localhost' with the IP (or hostname) of the server in client.rb.

Even just this simple example is immensely powerful. The above object could be used as a shared data store for session data on a webserver. Each web page request can look up and store information in this shared object. It works whether the web pages are served via standalone CGI scripts, Webrick threads, Apache mod_ruby, or fcgi/mod_fastcgi. It even works if you have a cluster of webservers. Furthermore, the session data is not lost if you restart Apache.

Functionality

[edit | edit source]

DRb is actually rather sophisticated and elegant in its design, but the fundamental principle is very straightforward:

DRb packages up a method call as an array containing the method name and the arguments, converts it into a stream of bytes using the Marshal library, and sends it to the server. The server then executes the call on the front object to determine the result. The so received return value and eventual exceptions are put into another array, converted into a stream of bytes and sent back to the client.

Since DRb is written in Ruby, you can look at the code, which contains lots of comments and examples. You can find it on your system at a location like /usr/local/lib/ruby/2.1/drb/drb.rb or you can find a parsed version of the documentation and the examples here.

Security

[edit | edit source]

If you are using a DRb object to store session data, make sure that only the webserver can contact your DRb object, and that it is not directly accessible from the outside, otherwise unwelcome guests could directly manipulate its contents. You can bind it to localhost (127.0.0.1) if all clients are on the same machine; otherwise you can put it on a separate private network, use firewall rules or DRb ACLs to block access from unwanted clients. It is important to do this before calling DRb.start_service.

Example usage of ACL:

require 'drb'
require 'drb/acl'

acl = ACL.new(%w{deny all
                allow localhost
                allow 192.168.1.*})
DRb.install_acl(acl)

DRb.start_service('druby://localhost:9000', obj)

Beware that every object contains methods which could be very dangerous if called by a hostile party. Some of these are private (e.g. exec, system) and DRb prevents these from being called, but there are other public methods which are equally dangerous (e.g. send, instance_eval, instance_variable_set). Consider for example obj.instance_eval("`rm -rf /*`").

So sharing an object with the whole Internet is a risky business. If you're going to do this then you should run with at least $SAFE=1, and you should start your object from a blank slate without these dangerous methods included. You can achieve that like this:

class BlankSlate
  safe_methods = [:__send__, :__id__, :object_id, :inspect, :respond_to?, :to_s]
  (instance_methods - safe_methods).each do |method|
    undef_method method
  end
end

class MyService < BlankSlate
  def increase_count
    @count ||= 0
    @count += 1
  end
end

DRb.start_service('druby://localhost:9000', MyService.new)

Note that this example doesn't use initialize() for setting @count to 0. If it did this, clients would be able to reset @count by calling the initialize method.

Here's an alternative implementation from Evil-Ruby.

# You can derivate your own Classes from this Class
# if you want them to have no preset methods.
#
#   klass = Class.new(KernellessObject) { def inspect; end }
#   klass.new.methods # raises NoMethodError
#
# Classes that are derived from KernellessObject
# won't call #initialize from .new by default.
#
# It is a good idea to define #inspect for subclasses,
# because Ruby will go into an endless loop when trying
# to create an exception message if it is not there.
class KernellessObject
  class << self
    def to_internal_type; ::Object.to_internal_type; end

    def allocate
      obj = ::Object.allocate
      obj.class = self
      return obj
    end

    alias :new :allocate
  end

  self.superclass = nil
end

Additionally, rather than sharing your original object, you may wish to build a wrapper object and share that instead. The wrapper object can have a limited set of methods (just the ones you really want to share), validate the parameters of incoming data, and delegate to another object when the data has been sanitised.

Thread-safety

[edit | edit source]

Each incoming method call which hits the object you've shared by DRb is executed in a new thread. This is pretty essential if you think about it; there may be many clients, and the server can't control when the clients decide to send method calls to it. DRb does not serialise the requests, so that one client can't block out the other clients.

However, this means you have to take the same care with your DRb object as you would in any other threaded application. Consider what happens, for example, if two clients both decided to run

obj[:counter] = obj[:counter] + 1

at the same time. It might happen that both clients would retrieve obj[:counter] and see the same value (say 100), then independently add 1, and then both write back 101. That's probably not what you want, if :counter is supposed to generate unique sequence numbers.

Even the method myhash.inc shown at the top of this page suffers the same problem, because two clients could decide to call inc(:counter) at the same time, causing two threads on the server to suffer the same race condition. The solution is to protect the increment operation with a Mutex:

require 'drb'
require 'thread'

class MyStore
  def initialize
    @hash = { :counter=>0 }
    @mutex = Mutex.new
  end
  def inc(elem)
    # The mutex makes sure that in case there being another thread running the
    # block given to the synchronize method, the current thread will wait until the
    # other thread finishes execution of this part, before it runs the block itself.
    @mutex.synchronize do
      self[elem] = self[elem].succ
    end
  end
  def [](elem)
    @hash[elem]
  end
  def []=(elem,value)
    @hash[elem] = value
  end
end

mystore = MyStore.new
DRb.start_service('druby://localhost:9000', mystore)
DRb.thread.join

Uncopyable objects

[edit | edit source]

Why does the client run DRb.start_service?

A very good question, which leads us on to another interesting aspect of DRb.

In normal operation, DRb will use Marshal to send the arguments to a method call; when they are unmarshalled at the server side, it will have a copy of those objects. The same applies to the result returned from the method; it will be marshalled, sent back, and the client will have a copy of that object.

In many simple cases this copying of objects is not a problem, but there are several cases where it might be:

  • If the server makes a change to the local copy it received, then the client won't see that change.
  • The argument or response objects could be extremely large, and you might not want to send them back and forth (such as an object which holds references to other objects, forming a tree)
  • Some types of objects cannot be marshalled at all: they include files, sockets, procs/blocks, objects with a singleton class, and any object which contains those objects indirectly, e.g. in an instance variable.

In these cases, DRb can instead send over a 'proxy object' containing contact details to allow the original object to be called via DRb: that is, the hostname and port where the original object can be found. This is done automatically for any object which cannot be marshalled, or you can force it by including DRbUndumped in your object.

How can we demonstrate this? Well, consider the class defined in the following file, foo.rb

class Foo
  def initialize(x)
    @x = x
  end
  def inc
    @x = @x.succ
  end
end

Now, let's have a server which accepts an object and calls 'inc' on it:

require 'drb'
require './foo'

class Server
  def update(obj)
    obj.inc
  end
end

server = Server.new
DRb.start_service('druby://localhost:9001', server)
DRb.thread.join

Here's the corresponding client:

require 'drb'
require './foo'

DRb.start_service
obj = DRbObject.new(nil, 'druby://localhost:9001')
a = Foo.new(10)
b = Foo.new(20)
puts a
puts b
obj.update(a)
obj.update(b)
puts a
puts b

Now, here's what happens if we run it:

$ ruby client2.rb
#<Foo:0x817e760 @x=10>
#<Foo:0x817e74c @x=20>
#<Foo:0x817e760 @x=10>
#<Foo:0x817e74c @x=20>

Oops. We passed across our objects 'a' and 'b', but because they were copied onto the server, only the local copies got updated by 'inc'. The objects on the client are unaffected.

Now try modifying the definition of Foo like this:

class Foo
  include DRbUndumped
  
  # ... same as before

Or alternatively you can modify the client program like this:

a = Foo.new(10)
b = Foo.new(20)
a.extend DRbUndumped
b.extend DRbUndumped

# ... same as before

And now the result is what we'd hope for:

$ ruby client2.rb
#<Foo:0x817e648 @x=10>
#<Foo:0x817e634 @x=20>
#<Foo:0x817e648 @x=11>
#<Foo:0x817e634 @x=21>

So what's happened is, instead of marshalling across an instance of Foo, we have marshalled across the information needed to build a proxy object: it contains the client's hostname, port, and object id which can be used to talk to the original object. When we pass across the proxy object for 'a' to the server, and it calls obj.inc, the 'inc' method call is made back over DRb to the client machine where object 'a' actually lives. You have effectively built a remote 'reference' to the object which can be passed around much like a normal object reference, except it can be handed from machine to machine. Method calls via this reference hit the same object.

Now, this is why the client program needs to run DRb.start_service - even though it's a "client" from our point of view, there might be method call arguments which generate these DRb proxy 'references', at which point the client also becomes a server for those objects.

We didn't specify a host or port here, so DRb chooses any spare TCP port on the system, and the host is whatever the system hostname is according to the 'gethostname' call - e.g. if the machine is called server.example.com then DRb might choose druby://server.example.com:45123

These two-way method calls can be a problem though when there is a firewall between the two machines. You can choose a fixed port on the client side in DRb.start_service instead of having one chosen dynamically; that lets you open up a hole in the firewall for DRb. However, if you are behind a NAT firewall, it almost certainly won't work at all.

Running DRb over SSH

[edit | edit source]

One way to solve the problem with two-way method calls through a firewall is to run DRb over SSH. Not only do you get two-way operation with just a single outbound TCP connection through the firewall; you also have your method calls securely encrypted!

Here's how to set it up.

  1. Choose one port for the client end (say 9000) and one for the server end (say 9001)
  2. Establish an ssh connection with a pair of tunnels: port 9001 at the client side is redirected to port 9001 at the server side, and port 9000 at the server side is redirected to port 9000 at the client side.
    $ ssh -L9001:127.0.0.1:9001 -R9000:127.0.0.1:9000 server.example.com
    The -L flag requests that connections to port 9001 at the local (client) side are redirected through the ssh tunnel, and reconnected to 127.0.0.1:9001 at the server side. The -R flag request that connections to port 9000 at the remote (server) side are redirected back down the ssh tunnel, and connected to 127.0.0.1:9000 at the client side.
  3. At the server side, do DRb.start_service('druby://127.0.0.1:9001', a) as you would normally
  4. At the client side, do DRb.start_service('druby://127.0.0.1:9000') instead of just DRb.start_service. This gives us a fixed port number to work from.
  5. At the client side, connect to the remote object as:
obj = DRbObject.new(nil, 'druby://127.0.0.1:9001')

Voila, you are up and running. You can try the DRbUndumped example from above, with the client behind a NAT firewall. Also notice that the ssh -L and -R options bind to 127.0.0.1 by default, so people on other machines cannot connect to the tunnel endpoints (although of course, other people on the same machine can do so).

An alternative to establishing an SSH connection from the command line is to use Net::SSH, a pure-Ruby implementation of SSH. If you haven't already, install Net::SSH using gem install net-ssh. To create a connection, execute the following before using DRb:

require 'net/ssh'
require 'thread'

channel_ready = Queue.new
Thread.new do
  Net::SSH.start('ssh.example.com','username',:port=>22) do |session|
    session.forward.local( 9001, '127.0.0.1', 9001)
    session.forward.remote( 9000, '127.0.0.1', 9000 )
    
    session.open_channel do |channel|
    end
    channel_ready << true

    session.loop
  end
end
channel_ready.pop

Following this, you can execute DRb code in the main thread as you would in the previous SSH example. The channel_ready Queue simply forces the main thread to wait for the channel to open.

NOTE: Do not use 'localhost' in place of '127.0.0.1' when using SSH and DRb, it can cause connections to be refused.

Running DRb over SSL

[edit | edit source]

SSL is another way to secure and encrypt your connections (note: SSL and SSH are *not* the same thing!)

Online tutorial: HTTP://segment7.net/projects/ruby/drb/DRbSSL/

Running DRuby through firewalls - ruby-only solution ( HTTP://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/89976 ) Often a client has firewall installed, so standard DRb will not be able to make callbacks, making block/io/DRbUndumped? arguments useless. To make sure DRb operates as normal, one can use HTTP://rubyforge.org/projects/drbfire and HTTP://drbfire.rubyforge.org/classes/DRbFire.html

from documentation:

  1. Start with require 'drb/drbfire'.
  2. Use drbfire:// instead of druby:// when specifying the server url.
  3. When calling DRb.start_service on the client, specify the server's uri as the uri (as opposed to the normal usage, which is to specify *no* uri).
  4. Specify the right configuration when calling DRb.start_service, specifically the role to use. Server: DRbFire::ROLE => DRbFire::SERVER and client: DRbFire::ROLE => DRbFire::CLIENT

Simple server:

require 'drb/drbfire'
front = ['a', 'b', 'c']
DRb.start_service('drbfire://some.server.com:5555', front, DRbFire::ROLE => DRbFire::SERVER)
DRb.thread.join

And a simple client:

require 'drb/drbfire'
DRb.start_service('drbfire://some.server.com:5555', nil, DRbFire::ROLE => DRbFire::CLIENT)
DRbObject?.new(nil, 'drbfire://some.server.com:5555').each do |e|
  puts e
end
[edit | edit source]

Alternative tutorials on the use of DRb:

Official Ruby documentation of the DRb module