Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
Resque is a lightweight, fast, and a very powerful message queuing system used to run Ruby jobs in the background, asynchronously. The background jobs could be of any Ruby class, which responds to perform method of Resque. In a high end tech stack specifically catering web scraping, Resque does a great job in scheduling the jobs and keeping everything tidy. It is generally used when the system demands better scalability and quick response.
A code snippet for such a job is displayed below:
@queue = :myqueue
#Jobs will be placed and picked up from this queue.
# call to queue processing in Resque until later
Resque is fully featured, especially when the number of jobs are quite high! It continuously checks for pending jobs and modifies those jobs in-place for fast processing. Also, the jobs could be prioritized for fast pushing and popping from the designated queues. Apart from these, it constantly monitors what the workers (explained later) are performing and kill those workers that are running for too long.
How the queuing of Redis works:
The real ability of Resque comes from its queue, Redis. Redis is a NoSQL key-value store. Unlike other NoSQL key-value stores, which takes only strings as key/value, Redis can process lists, sets, hashes, and sorted sets as values, and operates on them atomically. Resque works on the top of Redis list datatype, where the queue name behaves as a key and list as values.
Jobs of resque are en-queued on the Redis queue, and the designated workers, de-queues the same for processing. All these operations are atomic; hence the enqueuers and workers do not bother about the locking and synchronous access. Every element of a list (set, hashes etc.) must be string, and so the data structures are not nested in Redis.
Redis is a very fast, in-memory dataset, and can persist to disk (configurable by time or number of operations), or save operations to a log file for recovery after a re-start, and supports master-slave replication.
Redis owns the command set to read and process the keys and it does not need SQL to inspect its data.
How Queuing with Resque works:
Resque stores the job queue in Redis list named “resque:queue:name”, where each element of the list is a hash. Redis also maintains its own management structures, with an additional “failed job” list. Resque has a general tendency to keep all the operations light and easy to track. For the resque workers to work, we don’t need to pass a lot of data to the job hash. Instead, we just need to pass the metadata of the records, files etc. in the job hash.
When these jobs are popped from the queue, Resque instantiates an object of its parent class and calls its perform method, passing the additional parameters.
When a job is popped from the queue, Resque instantiates an object of its parent class and calls its perform method, passing the additional parameters.
Calling external systems with Resque :
There are ports of Resque to other languages such as python, C, Java, .NET, node, PHP and Clojure. If your external system is written in one of these languages, you can start workers listening to their queues. Since you are not talking to a ruby class with arguments, you can setup a placeholder class with the proper queue name. This will allow Resque plugins to fire on enqueue.
Definably the major pitfall of Resque is that it is mostly Ruby centric, but of-course looking at its performance and so many available features, it is worth working with for a better production quality.
Overwhelmed by the technology jargon? Let us know about your requirements and we’ll handle the complicated part of data acquisition for you.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.