The HipHop Virtual Machine (HHVM) and the Hack programming language haven’t really gotten the coverage they deserve. The biggest news in DevOps culture over the past few years had to do with the performance of PHP 7 and the Zend Engine 3. The quieter developments in HHVM may end up being a more influential trend in the long run. I wanted to find out more about what drove these new technologies and where they’re headed, so I sat down with Drew Paroski, co-creator of HHVM and Hack. Here are some highlights from our wide-ranging discussion.
What Are Hack and HHVM?
You really can’t define either one without talking about the other. They were developed around the same time to improve application performance and developer productivity. HHVM is a virtual machine (now open source) designed to execute Facebook’s upgraded version of PHP, which became known as the Hack programming language. The power of HHVM derives from an original just-in-time (JIT) compiler that was more flexible and delivered better performance than existing PHP.
How HHVM Began at Facebook
Paroski is now a senior architect and engineering manager at MemSQL, but in 2009 Facebook hired him away from Microsoft to help solve a massive problem. That was the year that Facebook’s membership doubled from January to September, rising up to 300 million. As their dependence on PHP expanded, they needed more servers running more code and it was clear where that was going. Bugs, code maintenance and infrastructure expenses were tracking right alongside growth forecasts. A level shift had to happen to improve the speed and the performance of PHP.
It all started as HipHop for PHP, which was open sourced in 2010. Paroski explained, “HipHop for PHP was an ahead-of-time compiler, and the centerpiece of its optimization strategy was to leverage type inference to generate more efficient type-specialized code. Unfortunately, it’s often impossible to statically infer the type of a given PHP variable with certainty. That was arguably the biggest technical challenge that inspired the creation of HHVM.” Paroski continued, “HHVM had the major advantage of a being a just-in-time (JIT) compiler which doesn’t have to generate all the native code for your application up front. It could do it in pieces and observe details about how your program behaves as it actually runs. We were able to capitalize on those details to perform type inference and generate type-specialized code on the fly while the program was running.”
Metrics and Wider Applications
HHVM turned out to be a big win for performance after launching to production in early 2013. “Each new release of HHVM delivers improvements in terms of memory usage and CPU utilization per request. Improving these metrics ultimately translates to being able to serve more requests per second with a single machine, meaning that you don’t need as many web servers to serve the same amount of traffic”, said Paroski.
Up to that point, Paroski and his fellow engineers were just tuning HHVM’s performance exclusively for Facebook’s PHP codebase. “We realized that it would be useful to people outside of Facebook after it launched to production. It was no longer this toy of ours but something that real large websites would want. We wanted it to be something that the community could contribute to and other people could use. That’s important for the long term success and viability of a language execution engine.”
The Impact of PHP 7
Just as HHVM was making a name for itself in the DevOps community, PHP finally addressed many of its performance issues with the new PHP 7 in 2014 and the first stable release a year later. “PHP 7 made great strides in terms of performance over various PHP 5 releases,” Paroski said. “Even going from PHP 5.3 to 5.4, there were noticeable improvements. In PHP 7, they definitely put the development focus and energy on improving performance. I like to think that this was motivated in part by some healthy competition from HHVM, but I don’t mean to take focus away from the hard work of all those who contributed to building PHP 7.”
What PHP 7 didn’t bring though, was a JIT compiler. Like many other dynamic language execution engines, Zend Engine 3 still uses an interpreter-based approach to execute PHP applications. Paroski pointed out that, “They have a very advanced and highly tuned interpreter, but the point is that they are not actually compiling PHP to native code. That makes the performance improvements of PHP7 even more impressive.”
For developers who are devoted to the finer points of performance optimization, what PHP 7 developers actually did is a fascinating lesson. Paroski explained, “A lot of the performance improvements in PHP 7 came from reducing memory usage and improving memory access and allocation patterns. This includes a lot of the classic things that people who are tuning for performance pay attention to – things like making common data structures smaller, reducing the number of indirections and allocations, etc. Reducing memory usage is particularly important for larger applications, because it reduces the rate of cache misses which improves application speed.”
Details on Performance Optimization
We covered some of the most important details on PHP 7 improvements in an earlier blog, but I wanted to know what Paroski found most valuable. He responded, “One important detail, for example, has to do with how the engine represents a variable. They have this data structure called a ‘zval’, and before PHP 7 every variable slot was a pointer to a zval which had to be dynamically allocated. The zval in turn would contain a type tag and often a pointer or handle to some other block of memory such as an object or an array.”
“So every time the engine needed to get at the contents of some object, it had to dereference the pointer that’s in the variable slot to get to the zval, and then it had to dereference the pointer or handle inside the zval to get to the contents of the object. In PHP 7, they got rid of this two hop scheme. This improvement by itself likely had a huge impact on reducing the number of memory allocations, reducing memory usage, and improving memory access patterns.”
Wikipedia’s HHVM Conversion
While PHP was addressing issues like these, companies with large-scale PHP applications turned to HHVM including Box and Wikimedia. I was especially interested in what convinced these large scale operations to take the plunge and adopt something that was essentially a brand new technology.
Paroski recalled, “We started talking with some of the folks at the Wikimedia foundation, which hosts wikipedia.org. Naturally, there were around how much effort the transition would take. A site like that couldn’t afford to go down for any amount of time. HHVM at the time wasn’t as mature as it is now, so it was almost certain they were going to run into a few problems in implementation. They knew that up front, and they were going to have to push through those problems to make switching to HHVM a success.”
Facebook ended up virtually loaning a developer to help with converting Wikipedia to HHVM. “One of the engineers on the HHVM team spent a great deal of time with Wikimedia, helping them work through their problems. Longer term, they wanted to be sure that HHVM remained a healthy project and isn’t something that stops being supported down the road. That’s a fear that everyone has when they consider a technology that’s relatively young. I think the HHVM team has done an excellent job demonstrating that they genuinely care about the success of all the different companies out there using HHVM. For most companies, compilers and runtimes are not a central focus of their business. They’re trying to do something else. How they execute PHP is very secondary to their mission.”
Handling PHP Extensions
That led to me to question why people would jump to HHVM when there have been so many extensions created for PHP over the years. I knew that Wikipedia had been using many of those extensions and asked how they addressed it. Paroski agreed that the team had spent a great deal of time thinking about that and working on solutions. He said, “HHVM inherited a lot of its extension implementations from HipHop for PHP, which was developed to solve problems at Facebook. HipHop for PHP had its own extension framework, and custom versions of each extension were coded up manually as needed. As HHVM went open-source and people expressed interest, support for extensions was one of the key areas of concern.”
They ended up addressing it in two ways. “One was to make writing extensions for HHVM as painless as possible,” he said. “You are able to write most of your extension implementation in PHP and only the parts that need to be native, you can write in C++. The second route is an experimental compatibility layer. In 2013, we thought, ‘Wouldn’t it be cool if we could just take the source code of an extension written for the Zend engine and get it to talk to HHVM?’ The idea was that you would recompile the original extension from source in a special way that would bind it to HHVM. A few developers on the team ran with that idea. It ended up being used a fair amount to help accelerate Wikimedia’s transition to HHVM.”
Now, HHVM and PHP are both seen as viable options for any website.
The Next Tuning Challenge
Where HHVM goes next depends on the open-source community and the next generation of DevOps expectations. Paroski summed up saying, “In the past, HHVM was only being performance-tuned for one application, so there’s a lot more tuning that needs to happen in both directions.” The companies and organizations that take up the HHVM challenge over the next few years will certainly have a substantial impact how the HHVM community and the technology evolves from here.