I was recently working on a php based project for a client. I needed to pull in content from 10 different URLs, then process that data into something useful.
I have a growing love affair with cURL, so it was my natural selection for the project.
In my first approach, I looped over the list of URLs, sent the cURL request, wait for the response, stored the data. repeat until done.
That worked, but it was very slow process. The page was taking between 30 – 45 seconds to load. That doesn’t sound long, but in terms of web applications, that’s an eternity.
I tried several tricks to speed the application up, but the bottle neck was the cURL calls. Each call was done in a synchronous manner. Every call to cURL had to be completed before the next could be made.
After doing to research on how to speed this up, I came across the following php cURL functions:
Using these methods allow for cURL to send asynchronous requests, solving my pervious problem.
A little more searching, and I found an awesome wrapper library that takes the guesswork out of the using the curl_multi methods. “ParallelCurl” https://github.com/petewarden/ParallelCurl
The sample code provided with the library was very straight forward and super easy to use.
After using asynchronous cURL calls via the ParallelCurl library, I was able to reduce the page load time from 45 seconds to 15 seconds. It’s still slow, but it’s a HUGE improvement, and it makes the application usable, and reduces load on my server. It’s a win-win-win situation!