Dwi Wahyudi
Senior Software Engineer (Ruby, Golang, Java)
Overview
Ruby on Rails is famous for its conveniences and out of the box solutions, whatever we need to develop a web application, Rails has it, but it has its own stigma like “Rails can’t scale” due to the fact that A Ruby program can’t run in multiple processors.
It’s really different with Golang let say, for web application server, because by design a Golang HTTP server can run concurrently, with its default GOMAXPROCS
setting, it can run in multiple processors at once, with no configuration.
With Ruby on Rails, we must utilize Puma and do certain configurations, so multiple instances of Rails server can run in multiple processors. But sometimes, even doing so won’t be enough for certain tasks. Some tasks can take long time to process. In such particular cases we need concurrency mechanism, Ruby on Rails will need to concurrently run such tasks at the same time.
Can Ruby on Rails run concurrent tasks?
The answer is yes.
Ruby has green-thread mechanism (just like Golang), but unlike Golang, a Ruby (and Rails) app cannot be run on multiple processors (Ruby can’t do multithreading) because of the GIL limitation. Which means if our code needs to run CPU intensive computations, we can’t do so in multiple processors, with Ruby concurrency, one thread will block the other, only one Ruby thread can run at one time in one processor.
Even so, concurrency can still deliver faster results if done right, especially for blocking operations like I/O (database, API call, sleep, file reading, etc.). One thread while waiting for response can sleep, while another thread can get the lock and run. Ruby will then do its own scheduling to run the sleeping thread (when IO response has arrived for such particular thread).
Let say, we have a Rails app to do some monthly orders calculations for several branches, and it’s quite long database call if we have plenty of branch_ids
.
What we can do is to split branch_ids
into multiple chunks/segments and make concurrent calls to database. We need to limit such concurrent numbers with concurrent_count
, so that it won’t drain app and database resources. If there are 20 branch IDs, then there will be 4 chunks, with each chunk has 5 branch IDs. Each chunk will be used as parameters to CalculateMonthlyOrders
class. This also means that there will be 4 database calls from ParallelCalculateMonthlyOrders
at the same time.
Thread A will call database, while waiting for response, it can sleep for a while, then thread B will run and do the same thing, waiting for the response and sleep, then thread C will run and do the same thing, and so on.
When response from database arrives for thread A, Ruby interpreter will schedule it to continue its run, the same thing applies to other threads. If more than 1 threads are ready to run, Ruby will pick thread randomly.
We then wait for all of those threads to complete their respective tasks and join the results (into values
) from all of them with ActiveSupport::Dependencies.interlock.permit_concurrent_loads
. This is done wrapped with Rails special method Rails.application.executor.wrap
. This particular method is mostly useful for Rails to allocate database connection pools correctly. That is, we must plan ahead available database connection pools with the concurrency we will do.
Concurrent::Future.execute
is part of concurrent-ruby https://github.com/ruby-concurrency/concurrent-ruby. We can use many other features and utilities provided by it, as long as we wrap it with Rails.application.executor.wrap
in our Rails app.
class ParallelCalculateMonthlyOrders
attr_reader :start_date, :end_date, :branch_ids, :concurrent_count
private :start_date, :end_date, :branch_ids, :concurrent_count
def initialize(start_date:, end_date:, branch_ids:, concurrent_count: 4)
@start_date = start_date
@end_date = end_date
@branch_ids = branch_ids
@concurrent_count = concurrent_count
end
def call
branch_ids_chunks = branch_ids.in_groups_of(branch_ids.length / concurrent_count).compact
chunks_length = branch_ids_chunks.length
values = nil
Rails.application.executor.wrap do
futures = chunks_length.times.collect do |index|
Concurrent::Future.execute do
Rails.application.executor.wrap do
CalculateMonthlyOrders
.new(start_date: start_date, end_date: end_date, branch_ids: branch_ids_chunks[index].compact)
.call
end
end
end
values = ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
futures.collect(&:value)
end
end
# Aggregate values from all chunks.
values_numbers = values.map do |values_chunk|
values_chunk.map(&:to_d)
end
values_numbers.transpose.map(&:sum).each(&:to_s)
end
end
With this concurrency in place, we expect that the processing is 4 times faster but also requires 4 times resources at the same times. We must balance this things out so that our concurrency will not drain the resources and disrupt the app and database.