For a PHP developer, asynchronicity is the most puzzling aspect of the Node.js runtime. It's simply a new way to write programs. And once you pass the first learning steps, event-driven programming opens a world of possibilities PHP programmers would never dream of. I'll try to explain you how it works, but first, let's talk about pasta.
I love good pasta. But it's very easy to cook bad pasta, so I have a motto: Do Not Eat Pasta Cooked by People You Don't Trust. Since you're a person I trust, I'll give you a simple recipe that you can't screw up, so that when we meet in real life, we can eat together.
The recipe is spaghetti with fresh tomato sauce. Put a large quantity of water in a pan, together with a pinch of salt - it will take approximately 10 minutes to boil. In the meantime, peel a bunch of fresh tomatoes and chop them roughly. Also peel and mince an onion, a carrot, some garlic, and a stalk of celery. This mix is called a mirepoix, or, as the Italians say, a soffritto. At that time, the water should be boiling, so you can put the spaghetti in the pan. While it boils, put some olive oil in a second pan and heat the mirepoix for a few minutes. Then add the tomatoes, some herbs, salt and pepper, and stirr. You should heat the tomatoes for five minutes at most, because after that they create a certain acidity that only disappears after an hour of heating. Taste the spaghetti regularly to be sure to take it out of the water when it's cooked al dente. Drain it, serve in a warm plate with the sauce, add some basil cut in a chiffonade, and some parmiggiano cheese. Voilà.
If you're not starving after reading this, you can probably follow the next exercise: how would you ask a computer to cook my spaghetti with fresh tomato sauce? In PHP, that would look like the following:
There is a big problem with this program: the whole recipe takes 29 minutes, and the spaghetti arrives in the plate already cold, because it waits for more than ten minutes until the sauce is ready. Cold spaghetti is an offense, you should avoid it by all means. When I cook this spaghetti, I boil the water at the same time as I chop the vegetables. Read again the recipe: it contains indications such as "in the meantime", "while it boils", and "taste regularly". The PHP program doesn't do that, no wonder the spaghetti it produces is inedible.
Let's teach PHP to cook properly. For this purpose, the best design pattern is event-driven programming. The program now relies on a central
EventLoop class, which represents a service that loops, incrementing an internal tick at each loop. Other services can add callbacks to be executed at a given tick. When all added callbacks are executed, the loop stops.
Taking advantage of this event loop, a new kind of service class can be created: asynchronous services. For instance, the asynchronous pan:
EventLoop and the
AsynchronousPan basically requires passing a callback to
warm(), and this callback can be a simple anonymous function:
When you execute this script, it produces the following output:
Starting to warm Tick is 1 Tick is 2 Tick is 3 Tick is 4 Tick is 5 Tick is 6 Tick is 7 Tick is 8 Tick is 9 Tick is 10 Now it's cooked
Now everything is ready for another spaghetti sauce. This time, can the sauce and the spaghetti be ready at the same time? Since the spaghetti takes 18 minutes and the sauce 11 minutes, the program should wait about 7 minutes to start cooking the sauce.
Ask PHP to execute the program, and asynchronicity starts to shine:
pastaPan: Starting to boil water Tick is 1 ... Tick is 7 saucePan: Starting to warm olive oil Tick is 8 Tick is 9 saucePan: Olive oil is warm saucePan: Starting to cook the Mirepoix Tick is 10 pastaPan: Water is boiling pastaPan: Starting to boil spaghetti Tick is 11 .... Tick is 14 saucePan: Mirepoix is ready to welcome tomato saucePan: Starting to cook tomato Tick is 15 ... Tick is 18 pastaPan: Spaghetti is ready saucePan: Tomato sauce is ready
Awesomesauce! Everything is ready at the same time after a short 18 minutes.
One side note: When you compare the first script and the last one, you immediately recognize a visual difference. In the first script, all lines of code align vertically; if you want to understand what happens sequentially, you just need to read from top to bottom. In the last script, on the other hand, it's indentation that means that statements get executed one after the other. In order to follow the course of the program, you must read from left to right. In event-driven programming, indentation means sequence.
There is just one problem. The plate gets served at the end of the script because the event loop runs out of callbacks. But what if the script had to schedule more callbacks for later execution, like washing the dishes for instance? The plate would be served very late, with cold spaghetti. That's something you should never, ever do. It is sacrilege.
So the script must not rely on the end of the event loop. That means no code should appear after the
$eventLoop->start() line. But how to synchronize the fact that the plate gets the contents of both pans, since they run in separate sequences?
You could wait until the tick is 18 and then serve the plate, but that would be betting on chance. Although the asynchronous services of this example are based on timing, this is seldom the case. Most asynchronous callbacks are triggered by the return of a third-party service, like a database query, the opening of a file on disk, or the response from an HTTP request. You never really know when this is going to happen. So you have to rely on new techniques for resynchronization.
On the case of the spaghetti with tomato sauce, it's quite easy:
The plate needs to know what it expects, and check whether it is complete each time it receives the contents of a pan. The example doesn't really matter, but you must remember that you are in charge of synchronization in event-driven programming, not the runtime.
The first script takes 29 minutes to cook bad pasta, the second one takes only 18 minutes for simple yet delicious pasta. And yet, there is still only one cook - or, in programming terms, one execution thread. You could spawn a new thread to cook faster, but they are things that can't be accelerated: If you use a flamethrower to warm up the tomato, it won't be ready sooner: it will be burned.
Let me rephrase this: This is not parallel programming. The code executed in the last script executes sequentially, however the event loop gives the illusion that code gets executed by "someone else". Also, this last script packs more actions in a shorter time. Seen from a CPU, that means less cycles wasted waiting for blocking operations, and more cycles actually spent computing. Event-driven programming allows you to use more of your CPU.
In sequential computing, the wasted CPU cycles by a given thread are salvaged by other, parallel threads. That's why, when using PHP with Apache and mod_php, you define the number of concurrent server processes. The trouble is, each time you spawn a new thread to recuperate wasted CPU cycles by another thread, it consumes a lot of memory. Conclusion: to use the CPU as efficiently as an event-driven program, a sequential program needs a lot more memory. And memory is the most expensive part of web servers.
How does Node.js compare to PHP? The same way the first script compares to the last one. Node.js promotes event driven programming, and facilitates it. First, it relies on JavaSript, which provides an advanced event system. Second, Node provides an EventLoop object, and starts it for you at the end of the main script. Last, Node provides new (native) implementations for most blocking I/O operations that takes advantage of asynchronicity.
For example, to delete a file on the disk using Node, the code looks like this:
Node starts by executing the main script until the end. The
unlink() function tells the disk to look for a particular file, remove it, and call back node when it is done. In the meantime, Node can continue executing more code. The 'deletion script' message is therefore logged before the deletion actually occurs. When it reaches the end of the script, it starts the event loop. It is now ready to receive a signal from the filesystem signaling the actual deletion, which will trigger the execution of the anonymous function that logs the 'successfully deleted' message.
Lastly, Node allows to write dynamic servers, while PHP needs a server on its own (like Apache) to handle the raw HTTP request. That means that a Node server stays in memory and is compiled only once. Compared to mod_php, which loads the entire PHP runtime and all the required scripts for every single HTTP request, that's a big boost.
this keyword. I already mentioned it in a previous post.
Overall, Node.js is faster than PHP, and consumes less memory. However, Node programming becomes increasingly difficult as the server codebase increases, because you then need to synchronize manually. That makes Node suitable for small to medium size servers doing simple operations - REST APIs, for instance, are a good fit.
To me, Node's true value, besides the event loop, is the new asynchronous implementation of database queries, file operations, HTTP queries, etc. This is pure gold, especially when you realize that other implementations basically waste your CPU cycles.
Now that you're aware of the most distinctive feature of Node, I hope you're convinced to give it a try. I also hope you're convinced that cooking your own pasta is easy and that you will try my open-source recipe.
Next in this series: Node.js for PHP programmers #2: Modules, Packages, And The Strawberry House.
Published on 23 Jan 2012