Dwi Wahyudi

Senior Software Engineer (Ruby, Golang, Java)

About
Posts

Overview

Ruby on Rails is famous for its conveniences and out of the box solutions, whatever we need to develop a web application, Rails has it, but it has its own stigma like “Rails can’t scale” due to the fact that A Ruby program can’t run in multiple processors.

It’s really different with Golang let say, for web application server, because by design a Golang HTTP server can run concurrently, with its default GOMAXPROCS setting, it can run in multiple processors at once, with no configuration.

With Ruby on Rails, we must utilize Puma and do certain configurations, so multiple instances of Rails server can run in multiple processors. But sometimes, even doing so won’t be enough for certain tasks. Some tasks can take long time to process. In such particular cases we need concurrency mechanism, Ruby on Rails will need to concurrently run such tasks at the same time.

Can Ruby on Rails run concurrent tasks?

The answer is yes.

Ruby has green-thread mechanism (just like Golang), but unlike Golang, a Ruby (and Rails) app cannot be run on multiple processors (Ruby can’t do multithreading) because of the GIL limitation. Which means if our code needs to run CPU intensive computations, we can’t do so in multiple processors, with Ruby concurrency, one thread will block the other, only one Ruby thread can run at one time in one processor.

Even so, concurrency can still deliver faster results if done right, especially for blocking operations like I/O (database, API call, sleep, file reading, etc.). One thread while waiting for response can sleep, while another thread can get the lock and run. Ruby will then do its own scheduling to run the sleeping thread (when IO response has arrived for such particular thread).

Let say, we have a Rails app to do some monthly orders calculations for several branches, and it’s quite long database call if we have plenty of branch_ids.

What we can do is to split branch_ids into multiple chunks/segments and make concurrent calls to database. We need to limit such concurrent numbers with concurrent_count, so that it won’t drain app and database resources. If there are 20 branch IDs, then there will be 4 chunks, with each chunk has 5 branch IDs. Each chunk will be used as parameters to CalculateMonthlyOrders class. This also means that there will be 4 database calls from ParallelCalculateMonthlyOrders at the same time.

Thread A will call database, while waiting for response, it can sleep for a while, then thread B will run and do the same thing, waiting for the response and sleep, then thread C will run and do the same thing, and so on.

When response from database arrives for thread A, Ruby interpreter will schedule it to continue its run, the same thing applies to other threads. If more than 1 threads are ready to run, Ruby will pick thread randomly.

We then wait for all of those threads to complete their respective tasks and join the results (into values) from all of them with ActiveSupport::Dependencies.interlock.permit_concurrent_loads. This is done wrapped with Rails special method Rails.application.executor.wrap. This particular method is mostly useful for Rails to allocate database connection pools correctly. That is, we must plan ahead available database connection pools with the concurrency we will do.

Concurrent::Future.execute is part of concurrent-ruby https://github.com/ruby-concurrency/concurrent-ruby. We can use many other features and utilities provided by it, as long as we wrap it with Rails.application.executor.wrap in our Rails app.

class ParallelCalculateMonthlyOrders
  attr_reader :start_date, :end_date, :branch_ids, :concurrent_count
  private :start_date, :end_date, :branch_ids, :concurrent_count

  def initialize(start_date:, end_date:, branch_ids:, concurrent_count: 4)
    @start_date = start_date
    @end_date = end_date
    @branch_ids = branch_ids
    @concurrent_count = concurrent_count
  end

  def call
    branch_ids_chunks = branch_ids.in_groups_of(branch_ids.length / concurrent_count).compact
    chunks_length = branch_ids_chunks.length
    values = nil

    Rails.application.executor.wrap do
      futures = chunks_length.times.collect do |index|
        Concurrent::Future.execute do
          Rails.application.executor.wrap do
            CalculateMonthlyOrders
              .new(start_date: start_date, end_date: end_date, branch_ids: branch_ids_chunks[index].compact)
              .call
          end
        end
      end

      values = ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
        futures.collect(&:value)
      end
    end

    # Aggregate values from all chunks.
    values_numbers = values.map do |values_chunk|
      values_chunk.map(&:to_d)
    end

    values_numbers.transpose.map(&:sum).each(&:to_s)
  end
end

With this concurrency in place, we expect that the processing is 4 times faster but also requires 4 times resources at the same times. We must balance this things out so that our concurrency will not drain the resources and disrupt the app and database.