Online education portals like Udacity and Coursera are really changing the world of remote learning in significant ways. By making free and high quality education accessible to a global audience, these platforms are opening up undreamt of possibilities for communities around the world to improve, grow, and prosper in the digital economy of the 21st century. Education at top tier colleges and universities has traditionally been a social and economic privilege, but now anyone can join in the learning revolution by sitting in virtual classrooms with the world’s best and brightest educators. Whether this involves learning how to code and build smart phone apps, or starting up a new business, or learning about public health literacy, the sky is the limit of what’s now possible.

Everything about Web and Network Monitoring

A Testing Odyssey

PHP_performance

To quote myself:

 “A smart person learns from his mistakes. A truly wise person learns from other people’s mistakes”.

Today I am taking you on an odyssey of discovery. I recently did some PHP performance testing to measure the effectiveness of some of the tips in Website Performance: PHP. I made some mistakes and I learned a few things. Perhaps there is something new here for you, too.
In my last article, I discovered that results can vary between servers. It’s not a huge revelation – sort of obvious, really – but it does have its consequences.

Lessons Learned #1: We cannot accept published performance metrics as gospel truth because the results on our own production server may be different. And where the rubber meets the road, our production server is the one that matters. That’s the server we need to use for tip testing.

And so I continued to use the test harness provided in Test Harness for PHP to examine other performance tips.

Is the Variable Initialized?

One PHP tip advises us to use isset() rather than comparing a variable to NULL, and to compare a variable to NULL instead of using is_null(). Any of these three options will tell us whether or not a variable has been initialized, but only the first is fastest.The three options being tested are:

Option #1:
   isset($abc);

Option #2:
   $abc === NULL;

Option #3:
   is_null($abc);

The options were iterated 10,000,000 times instead of my customary 1,000,000 because the results were too small (mostly 0, sometimes 1). Increasing the number of iterations multiplied the results by 10, so I divided them by 10 in the charts below. I do this so everything I present to you will be normalized to 1,000,000 iterations.

In all three cases, $abc has not been declared or initialized.

These are the results from server #1, which has an older version of PHP:

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.0        0.7        0.8        1
   0.1        0.7        1.0        1
   0.1        0.7        0.9        1
   0.0        0.7        0.8        1
   0.1        0.6        0.8        1
   0.1        0.8        1.0        1
   0.1        0.6        0.8        1
   0.1        0.7        0.8        1
   0.1        0.6        0.9        1
   0.1        0.7        0.8        1
   0.1        0.6        0.8        1
   0.0        0.7        0.8        1
   0.1        0.8        0.9        1
   0.1        0.7        1.1        1
   0.1        0.6        0.9        1
   0.1        0.7        0.9        1
   0.1        0.7        0.9        1
   0.1        0.7        0.9        1
   0.1        0.6        0.8        1
   0.1        0.6        0.9        1

I originally planned to run this test ten times on this server, but when option #1 kept showing up as the winner, I wanted to see if the trend would continue. And it did. Not only that, but option #2 kept showing up as being faster than option #3. [I know from my statistics classes that increasing the number of trials after seeing the results is a bad thing to do, but I didn’t care. Curiosity got the better of me.]

According to these results, this tip is confirmed. However, having learned lesson #1, I knew that test results from a single server are meaningless (unless it’s my production server), so I went ahead and ran the tests on the other server, which has a newer version of PHP. Here are the results:

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.1        0.3        0.6        1
   0.0        0.4        0.6        1
   0.1        0.3        0.6        1
   0.1        0.3        0.6        1
   0.0        0.4        0.5        1
   0.1        0.3        0.7        1
   0.0        0.3        0.6        1
   0.1        0.3        0.6        1
   0.0        0.4        0.5        1
   0.1        0.3        0.6        1
   0.0        0.4        0.6        1
   0.0        0.4        0.6        1
   0.0        0.4        0.6        1
   0.0        0.4        0.6        1
   0.0        0.4        0.5        1
   0.0        0.4        0.6        1
   0.0        0.4        0.5        1
   0.1        0.3        0.6        1
   0.1        0.3        0.5        1
   0.0        0.4        0.6        1

Well, that certainly tells a story, doesn’t it? It seems this tip is valid on these two servers with these two versions of PHP. We should use isset() to find out whether or not a PHP variable has been initialized.

Some people will point to the very small numbers in the charts above and throw the word “micro-optimization” on the table. They’re not wrong. However, if option #1 is fastest in every case, why shouldn’t we get into the habit of always using it? Develop the habit. Make it part of your coding style, then forget about optimization and micro-optimization.

By the way, that last sentence had the word if in it. We still haven’t proven that isset() is faster in every case on every server. But we do now have evidence pointing in that direction.

But What If $abc Is Initialized?

Hmm, that’s a good question. $abc was not initialized before running the above tests. Would initializing it make any difference? Let’s try it. The options are the same as above. The only difference is that we’ll include $abc = 25;before starting the clock. Here are the results for server #1 (the one with the older version of PHP):

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.2        0.0        0.3        2
   0.1        0.1        0.2       1/2
   0.2        0.0        0.3        2
   0.1        0.0        0.3        2
   0.1        0.0        0.3        2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.2        0.0        0.2        2
   0.1        0.1        0.2       1/2
   0.2        0.0        0.2        2
   0.2        0.1        0.2        2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.2        0.0        0.2        2
   0.2        0.0        0.3        2
   0.2        0.1        0.2        2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2

and here are the results for server #2 (the one with the more recent version of PHP):

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.1        0.0        0.2        2
   0.1        0.0        0.3        2
   0.1        0.0        0.2        2
   0.1        0.0        0.3        2
   0.1        0.0        0.3        2
   0.0        0.1        0.2        1
   0.1        0.0        0.2        2
   0.0        0.1        0.2        1
   0.1        0.0        0.2        2
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.1        0.0        0.3        2
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.0        0.1        0.2        1
   0.1        0.0        0.2        2
   0.1        0.0        0.3        2

What happened to our clear and decisive results? Option #1 is no longer the clear winner. Option #2 beats it or equals it every time on one server and they seem to be neck-and-neck on the other server. I smell another lesson coming on!

First we tested the case where $abc was not initialized, then we tested the case where $abc was initialized. We got different results. The three options perform differently depending on whether or not the variable is initialized. Ignoring this difference can lead to a bad decision.

Lessons Learned #2: Test all use cases, not just one or some. [When drawing conclusions, it’s good to consider which use cases your webapp uses most often.]

Did the Optimizer Mess Up My Tests?

The more I looked at the three options, the more I wondered. The three options are all do-nothing statements. They evaluate an expression, then do nothing with the result. There aren’t even any side-effects. A good optimizer could easily discard these statements, which means I could be measuring nothing.

Lessons Learned #3: Don’t simplify the options so much that the optimizer eliminates them.

I had to see whether or not this was happening, so I changed the options:

Option #1:
   if (isset($abc)) {$x=0;} else {$x=1;}

Option #2:
   if ($abc === NULL) {$x=0;} else {$x=1;}

Option #3:
   if (is_null($abc)) {$x=0;} else {$x=1;}

In each case the option is the condition of an if statement that actually does something, so the option has to be evaluated. The optimizer can’t discard the expressions this time. In all cases, $x is initialized to 25 before the clock starts running. Here are the results from running the same four tests as above, in the same order:

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.1        0.8        1.2        1
   0.2        0.9        1.0        1
   0.1        0.7        1.0        1
   0.2        0.6        0.9        1
   0.2        0.8        0.9        1
   0.2        0.7        1.0        1
   0.1        0.7        0.9        1
   0.2        0.7        0.9        1
   0.1        0.8        0.9        1
   0.1        0.8        1.1        1

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.1        0.4        0.6        1
   0.1        0.4        0.7        1
   0.0        0.4        0.8        1
   0.1        0.4        0.6        1
   0.0        0.4        0.8        1
   0.0        0.4        0.7        1
   0.0        0.4        0.7        1
   0.0        0.4        0.7        1
   0.1        0.4        0.6        1
   0.1        0.4        0.7        1

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.2        0.1        0.3        2
   0.2        0.1        0.3        2
   0.2        0.1        0.3        2
   0.2        0.1        0.4        2
   0.3        0.1        0.3        2
   0.2        0.2        0.3       1/2
   0.2        0.1        0.4        2
   0.2        0.1        0.4        2
   0.2        0.1        0.3        2
   0.2        0.1        0.3        2

Option 1   Option 2   Option 3   Winner
--------   --------   --------   ------
   0.1        0.0        0.3        2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.2       1/2
   0.1        0.0        0.3        2
   0.1        0.1        0.2       1/2
   0.1        0.1        0.3       1/2
   0.1        0.0        0.3        2

The results in these four tables are remarkably similar to the results in the previous tables. The actual execution time is slightly higher (as expected, since the assignment statements add to the CPU’s workload), but the differences between the three options are almost identical. The winner/loser conclusions are the same in all four.

It appears the optimizer didn’t throw away the expressions after all. However, since the optimizer may get smarter in the future, we still need to be aware of this possibility, and we need to test our results to make sure it doesn’t happen.

What About Option #3?

Oops! I was so focused on the winner, I forgot to even think about the loser, so I reviewed the data again. I noticed that option #3 is the slowest in every case on both servers. It shares last place a few times, but it is in last place every single time. I’m sure that says something. I think I’ll stop using is_null() until I see evidence that it’s faster than the other options.

Lessons Learned #4: Look at the data in different ways. Don’t get so wrapped up in one viewpoint that you miss the other information that is waiting to be discovered.

Conclusions

There is much more to calculating performance metrics than merely finding the fastest option. Improper test design can lead to meaningless results. Relying on those results can positively or negativelyimpact performance. If the test isn’t designed properly, there’s just no way to know.There are many more pitfalls waiting to be discovered, so I’ll keep my eyes open. In the meantime, I hope the above helps you avoid these four.
Post Tagged with

About Warren Gaebel

Warren wrote his first computer program in 1970 (yes, it was Fortran).  He earned his Bachelor of Arts degree from the University of Waterloo and his Bachelor of Computer Science degree at the University of Windsor.  After a few years at IBM, he worked on a Master of Mathematics (Computer Science) degree at the University of Waterloo.  He decided to stay home to take care of his newborn son rather than complete that degree.  That decision cost him his career, but he would gladly make the same decision again. Warren is now retired, but he finds it hard to do nothing, so he writes web performance articles for the Monitor.Us blog.  Life is good!