PHP’s include_once() Is Insanely Expensive

I’ve always heard the include_once() and require_once() functions were computationally expensive in PHP, but I never knew how much. I tested the following out on my i7 2010 MacBook Pro using PHP 5.3.4 as shipped by Apple.

This first test uses include_once() to keep track of how often a file is included:

$includes = Array();
$file = ‘benchmarkinclude’;
 
for($i=0; $i < 1000000; $i++){
    include_once($file.‘.php’);
}

Took: 10.020140171051 sec

This second example uses include() and uses in_array() to keep track of if I loaded the include:

$includes = Array();
$file = ‘benchmarkinclude’;
 
for($i=0; $i < 1000000; $i++){
    if(!in_array($file, $includes)){
        include($file . ‘.php’);
        $includes[] = $file;
    }
}

Took: 0.27652382850647 sec

For both, the include had the following computation:

$x = 1 + 1;

Lesson learned: Avoid using _once if you can avoid it.

Update: That means something like this will theoretically be faster:

$rja_includes = Array();
function rja_include_once($file){
    global $rja_includes;
    if(!in_array($file, $rja_includes)){
        include($file);
        $rja_includes[] = $file;
    }
}

DBSlayer + Node.js

Lately it seems the rage among developers is to take node.js and combine it with something else unusual. So here’s my contribution.

DBSlayer is a project by NYTimes a few years ago that seems to be somewhat forgotten but is pretty cool. It’s another MySQL proxy, but with a slight twist. Rather than use a binary protocol, or XML, they went with JSON. It supports things like connection pooling, round-robin distribution to slaves, automatic fallover, and it’s mutithreaded. It’s pretty fast and easy to work with. It’s almost like turning a MySQL database into a REST API. You pass a SQL query as a query argument and it gives you a JSON response.

Once you start it you can do something query using a request like:

http://localhost:9090/db?%7B%22SQL%22%3A%22SELECT%20*%20FROM%20facts%20WHERE%20id%3D1%3B%22%7D%20

That will give you a JSON object containing the result of your query.

So doing that in node.js is roughly:

var sql = ‘{"SQL":"SELECT * FROM facts WHERE id=7;"}’;
 
var http = require(‘http’);
var client = http.createClient(9090, ‘localhost’);
 
var request = client.request(‘GET’, ‘/db?’ + escape(sql), {});
request.end();
request.on(‘response’, function (response) {
  console.log(‘STATUS: ‘ + response.statusCode);
  console.log(‘HEADERS: ‘ + JSON.stringify(response.headers));
  response.setEncoding(‘utf8’);
  response.on(‘data’, function (chunk) {
    console.log(‘BODY: ‘ + chunk);
  });
});

Running that looks like this:

$ node test.js
STATUS: 200
HEADERS: {"date":"Fri, 27 May 2011 02:02:27 GMT","server":"dbslayer/beta-12","content-type":"text/plain; charset=utf-8","content-length":"290","connection":"close"}
BODY: {"RESULT" : {"HEADER" : ["id" , "fact" , "author" , "ip" , "timestamp"] , "ROWS" : [[7 , "1+1=2" , "raccettura" , "127.0.0.1" , 123456]] , "TYPES" : ["MYSQL_TYPE_LONG" , "MYSQL_TYPE_VAR_STRING" , "MYSQL_TYPE_VAR_STRING" , "MYSQL_TYPE_VAR_STRING" , "MYSQL_TYPE_LONGLONG"]} , "SERVER" : "db"}

You could obviously clean that up and create a little library to hide the HTTP parts.

It’s an interesting JS centric way to abstract your database while maintaining SQL level control. JSON is becoming the new XML.

PHP 5.3.4 Changes rand(), Filled My Error Log, Spikes Load

I ran into a peculiar situation with a PHP web application that went from working for several years without incident to suddenly resulting in timeouts and spiking the load on my server. Some investigation traced it back to a seemingly benign and obscure change to PHP’s rand() implementation between 5.3.3 and 5.3.4.

To summarize several hundred lines of code: it gets a value from an array where the index is a random number between X and Y. X and Y are highly unpredictable by nature of the application. It keeps trying with different values until something is returned. Something like:

function random($a, $b){
   if(MT_RAND){
       return mt_rand($a, $b);
   }
   return rand($a, $b);
}
 
$x = 0;
while($x == 0){
    $x = $arr[random($x, $y)];
}
return $x;

See it? If you don’t, you shouldn’t feel bad, I didn’t see it initially either.

Prior to PHP 5.3.4 mt_/rand did not check if the max is greater than the min. This has changed as a result of bug 46587. That 4 line change made an impact.

Take this example code:

print "right: " . mt_rand(1, 5) . "\n";
print "wrong: " . mt_rand(5, 1) . "\n";

In PHP 5.3.3 you’d get:

$ php test.php
right: 3
wrong: 4

$ php test.php
right: 2
wrong: 5

Despite the incorrect order of max/min it actually worked just fine. It had done so at least since PHP 4.3 (circa 2003) as far as I’m aware.

In PHP 5.3.4:

$ php test.php
right: 2
PHP Warning:  mt_rand(): max(1) is smaller than min(5) in /test/test.php on line 4
wrong: 

As a result, this while(){} never terminated until the timeout was reached.

The solution is obviously trivial once you actually trace this bug back:

function random($a, $b){
   
    // If a is greater, flip order
    if($a > $b){
        $tmp = $a;
        $a = $b;
        $b = $tmp;
   }
       
   if(MT_RAND){
       return mt_rand($a, $b);
   }
   return rand($a, $b);
}

This resulted in several GB’s worth of warnings in my error log in a matter of hours. You can also see how it (the brown area) dropped off once the fix was deployed as measured by % of wall clock time:

Transaction %

It’s the little things sometimes that cause all the trouble.

Budget Calculator Dice

When I was in High School somehow dice became popular for the 2000th time since the dawn of humanity. For some strange reason I felt the desire to build a little dice game in my TI 83 calculator in TI-BASIC. I had long forgotten about this, and recently stumbled upon it. I transcribed it to my computer recently since I don’t have one of those cables and decided to just throw it out there on the web.

Looking back on it, I could have written this a lot better 😉 . Regardless it’s a fun trip down memory lane. I believe I wrote most of this little gem while lifeguarding on a Sunday morning. Nobody drowned in the production of this code, I swear.

My favorite part may be the about text “DEPRIVING A VILLAGE SOMEWHERE OF AN IDIOT”. Someone at some point somewhere said that about me, and it stuck for years.

Continue reading

From Blog Post To Dictionary

A 2008 blog post of mine with one of my favorite titles ever “Object Oriented Masturbation” has led to someone creating an Urban Dictionary entry. My official stance on this is that I find it amusing, and want to accelerate the adoption of this new term.

object-oriented masturbation
ob-jikt | o·ri·ent-ed | mas⋅tur⋅ba⋅tion

  1. the stimulation or manipulation of one’s own ego by way of using object-oriented code in places where it has no advantage instead resulting in unnecessary complication and bloat.

Spread the word and be sure to visit the Urban Dictionary term and give it the thumbs up. They sell merchandise with dictionary entries on it too if you want a mug.

Another Brick In The Facebook Wall

I ran across the problem recently trying to write to a users wall using the Facebook API. The Facebook documentation is hardly sane as it’s a mix of languages, not entirely up to date, and lacks good examples. The error messages are hardly ideal either. “A session key is required” at least leads me in the right direction. “Invalid parameter” is just unacceptable and makes me stabby.

So here’s some cleaned up pseudocode I pulled together that will hopefully be of use to others who bang their heads against the wall. This “works for me” in my limited testing over several days:

require_once(‘./facebook-platform/php/facebook.php’);
 
$facebook = new Facebook($apiKey, $appSecret);
 
// This gets us the uid
$canvasUser = $facebook->get_canvas_user();
 
// And the session key
$sessionKey = $facebook->api_client->session_key;
 
// You need both of these permission bits
$user = $facebook->require_login($required_permissions = ‘publish_stream,offline_access’);
 
// You’ll likely have an application sitting here and at
// some point in your application be doing the following
 
// Here’s where we actually set the status
$facebook->api_client->call_method("facebook.status.set", array(
    ‘uid’ => $canvasUser,
    ‘status’ => "All in all it’s just another brick in the wall.",
    ‘session_key’ => $sessionKey
));

Getting the right permissions is key.

The thing that ends up being the most confusing is the session_key. After reading the docs, I was inclined to do:

$token = $facebook->api_client->auth_createToken();
$sessionKey = $facebook->api_client->auth_getSession($token);

What you really want is:

$sessionKey = $facebook->api_client->session_key;

You can also use adapt this to use stream.publish if you’d like.

Debugging Sherlock Holmes and Dr. House Style

The psychology of computer programmers is interesting stuff. The classic view of a programmer is someone who sits around creating all day like an artist or writer. It’s a creative job, trying to be the next Bill Gates. The reality is that the most spend a significant amount of their time diagnosing problems and debugging. That includes analyzing bug reports, tracing bugs, finding solutions, implementing, and testing. In practice this part is much more detective than artist.

After years of observation, I’ve come to the conclusion that most good programmers bare a striking resemblance to both Dr. House, and Sherlock Holmes (whom the Dr. House the character is partly based upon).

Overlapping Traits

  • Reluctance to accept cases they don’t find interesting – Both Dr. House and Sherlock Holmes hate cases that they aren’t interested in. Programmers always gravitate towards bugs that interest them, and shun bugs that don’t. Very few will even attempt to dispute this.
  • Otherwise lazy – When not solving something that interests them, they are what most would call lazy.
  • “Rubik’s complex” – Obsession with puzzles.
  • Reliance on a related science – Dr. House and Sherlock Holmes rely heavily on Psychology. Programmers gravitate towards user experience which Wikipedia defines as incorporating “psychology, anthropology, computer science, graphic design, industrial design and cognitive science”. Coincidence?
  • Substance dependence – Dr. House prefers Vicodin, Morphine and Sherlock Holmes went for Cocaine, Morphine. For programmers it’s often an extraordinary dependency on caffeine that keeps them going.
  • Overconfidence to the point of arrogance – I don’t think any further explanation is necessary. Programmers are as arrogant and defensive on their work as Dr. House and Sherlock Holmes are about their diagnosis/solution.
  • Introvert – Both Dr. House and Sherlock Holmes are introverts. So are many/most programmers.
  • Strong deductive reasoning skills – The best programmers are the ones who can analyze a bug report and using knowledge of the application and related technologies can diagnose the problem with accuracy that surprises even those with many years more experience.
  • Use of alternate names – Holmes and Dr. House call people by their last names. Programmers have this habit of using network names, usernames, IRC nicknames.
  • Showmanship for their skills – Dr. House diagnosed a waiting room full of patients in about a minute with surprising accuracy without meeting with each patient. Sherlock Holmes has a love for elaborate traps to show off. Programmers love to show off. That’s why so many blog. Accomplishments are the rare things they willingly document.

Slightly higher occurrence of the following personality traits may apply: “moody”, “bitter”, “antagonistic”, “misanthropic”, “cynical” “grumpy”, “maverick” and a “curmudgeon”.

Sherlock Holmes

Sherlock HolmesThe most distinct Holmes trait is that he refused guessing or theorizing before having the necessary clues or data to reach a conclusion and solve the case.

Programmers who fall in the Holmes party refuse to make guesses without seeing some evidence and immediately tries to reproduce the bug and starts tracing to gather data. Once there is a mountain of data they start to deduce the problem and the solution. Once they reach a conclusion they will break it down into a very concise deductive argument.

Pipe smoking is optional.

Dr. House

Dr. HouseThe most distinct part of the House approach is the willingness to make an educated guess based on limited information.

Programmers who fall in the House party are willing to make guesses early on and immediately start debugging in very calculated parts of their code base. As they come to realizations and learn new information they are willing to adjust or completely abandon their former approach and go with a new hunch.

They often find themselves soliciting ideas from others and using good ones however they almost enjoy shooting down ideas as invalid based on their skills and knowledge.

Conclusion

I’m pretty sure there is no “better” approach. It’s just two different ways of going after a problem. It’s completely possible to be a hybrid. I think it’s more of a spectrum of personality and technique.

The Jetpack Debate

I’ve generally found Jetpack to be pretty cool. It’s easier to develop and I’m fairly familiar with both “traditional” extension development and jQuery so it seems natural to me. However I generally agree with Daniel Glazman’s blog post on Jetpack. I’ll even agree that closures can make code more difficult to read, though I think I’ve mostly adapted to it at this point.

Jetpack reminds me more of building JS “widgets” than extensions. I’m not sure I see the advantage of moving away from XUL which really isn’t “hard” for 98% of things (though XUL <wizard/> has admittedly made me say WTF a few times) to HTML unless some sort of portability were gained, but that doesn’t seem very likely at least right now. I haven’t seen any indication of intent either. XUL has the advantage of making good UI seemingly easy while HTML really doesn’t, though I’ll admit HTML5 is changing that.

The biggest problem I see with Jetpack is that too much of it is designed around existing needs. The problem with this process is that it’s always playing catch-up. The best extensions are disruptive and do things nobody ever thought of, or even thought possible. Looking at the Jetpack JEP list I see pagemods and toolbar. The kicker is these are “implementing” and “planning” respectively right now.

Things like jetpack.slideBar, jetpack.music and especially jetpack.lib.twitter make me feel a bit concerned. Why? Because they encourage too much conformity, and too many twitter client Jetpacks.

When developers are given such a sterile environment that’s intended to promote experience and stability it ends up inadvertently creating monotony and stalling innovation. If you want proof look at the iPhone. There are indeed some great apps and I say that as an iPhone user myself, but for each great application there are 1,000 that aren’t worth the price (which is often free). Many are just cookie cutter apps with a companies logo on them. Google used one undocumented API for a feature Apple didn’t think of providing a documented API for, and it was news worthy. While Jetpack distribution isn’t limited in the same way that iPhone apps are with the App Store the design questions still remain.

To quote Adblock Plus author Wladimir Palant:

…Jetpack has to support Adblock Plus, not the other way around. As it is now, Jetpack isn’t suitable for complicated extensions.

That’s the wrong order.

The Programmer, Like The Poet

So poetic in itself…

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.

– Fredrick P. Brooks, The Mythical Man-Month (Ch. 1)

Z2k9 Bug Strikes The Zune

From the company that brought you Windows ME, and Windows Vista, Microsoft Corporation today introduced the world to the Z2K9 bug. Apparently all 30GB Zune’s reboot and freeze due to a bug in the date/time drivers. Classic. Microsoft’s solution is to simply wait until 2009 (a few more hours). Even more classic.

This does bring up one of every programmer’s biggest pet peeves: date/time code. I’ve mentioned my hatred of time before. It’s one of the most obnoxiously complicated things to work with due to all of the complexities from leap seconds to leap years. If you need to do something involving old dates, it gets even more complicated. Remember Julian Thursday, 4 October 1582 was followed by Gregorian Friday, 15 October 1582. Yes you read that right. Also don’t forget that only certain countries (mostly those under strict influence of the Pope) switched on that date. There was dual dating for some time. Then you have timezones, which ideally would be geographically correct and 15° of longitude apart, but instead zigzag and not even along territorial borders. Worst of all is daylight savings time. Not everyone participates in that, and sometimes just not every year, or at the same time. Even states are split, just check out the chaos in Indiana.

Griping aside, none of these likely caused the Zune bug. Since it’s a freeze, I’d guess it’s nothing more than an infinite loop or some other trivial programming error on a leap year.

Everyone remembers the infamous Y2K bug. Many uneducated folks still claim it was nothing to worry about and overblown, but it still cost between $300-600 billion dollars depending on whose estimates you believe (3.596 billion from the US military alone). Since a large portion of the cost was in the private sector, there’s no true tally.

The next big day to keep in mind is January 19 2038 3:14:07 GMT. That’s when the 32 bit computing will officially freak out since most Unix-like computers store time as a signed 32 bit integer counting the seconds since Jan 1, 1970 (Unix Epoch). After that we go back to 1901. There will likely be some 32 bit computing left in 2038 considering how long embedded systems can be ignored and silently slaving away in the background. For reference the B-52 Stratofortress entered operation in 1955 (they were built until 1962). They are expected to be taken out of service in 2040. This is the exception for US military aircraft, but don’t think this is the only old hardware out there. The Hubble Space Telescope has a 32 bit 486 processor and launched in 1990 and assuming the backup computer is functional it will be serviced soon to extend it’s life by another few years making it’s service life 20+ years. It’s unlikely Hubble will make it to 2038 but Hubble shows how long expensive systems can survive in active use. This date is only 30 years away. This will cost the world some serious cash.

On the upside according to Wikipedia 64 bit systems will be good until Sunday, December 4, 292,277,026,596. Odds are that won’t be a concern for most people alive today.

Reassuring? Yes. But your Zune is still fried for a few more hours.

Update [1/5/2009]: Here’s some pretty detailed confirmation that it was indeed an infinite loop error. I know my crashes 😉 .