Readers, tell me: How did you spend your leap second?
Update 2 July 2012
Well, looks like the Leap Second of 2012 crashed part of the Internet:
Yesterday’s leap second killed half the Internet, including Pirate Bay, Reddit, LinkedIn, Gawker Media and a host of other sites. Even an airline. Any Linux user processes that depends on kernel threads had a high chance of failing. That includes MySQL and many Java servers like webapps, Hadoop, Cassandra, etc. The symptom was the user process spinning at 100% CPU even after being restarted. A quick fix seems to be setting the system clock which apparently resets the bad state in the kernel (we hope).
The underlying cause is something about how the kernel handled the extra second broke the futex locks used by threaded processes. Here’s a very detailed analysis on the failing code but I’m not sure it’s correct. According to this analysis the bug was introduced in 2008, then fixed in March 2012. But it may be the March fix is part of the problem. Patch here. OTOH most of the systems that failed will be running kernels older than March so the problem must go further back. Time is hard, let’s go shopping.