How I sped up Smarty by 5x on Lighttpd

December 23rd, 2008 § 0

In this article I will discuss how I obtained a 5x speedup on a Smarty driven website on a Lighttpd Web server. This is achieved by enabling Lighttpd to a directly access and serve the cached files directly from the file system, rather than calling into Smarty.

Why is this approach so much faster?

Smarty’s caching engine does a great job at compiling the templates at the correct interval and this creates a drastic speedup compared to recompiling the template on each page request.

However, even when Smarty is serving up cached pages, there is a lot of overhead added to each request when compared to the Web server directly serving the cached page. This is because PHP is still being loaded, the Smarty library is being included, and a small amount of logic is being performed within Smarty before the cached page is finally being passed along.

Is this caching technique right for every situation?

In cases where the cache life must be very short due to frequent changes to the data being rendered, the approach I will explain in this article may not be viable. However, in these cases it would be still possible to use a push method to remove cache when the data changes, versus adding the overhead of checking for changes on each page request. In my opinion, this push approach to caching reduces the cost to the lowest possible value, so if the need for performance is of the utmost importance, then it makes sense to implement this approach.

In my case, the data changes occur infrequently and a combination of clearing the cache on a scheduled interval plus a method to manually force a recompilation of a specific page is adequate, and a worthwhile trade-off for the performance increase. Also, I am a performance junkie.

Implementation Notes

Since Smarty is a flexible library, every implementation is unique. So while I cannot give imperitive instructions on how you can implement this lighttpd cache, can explain how I did it.

This was my lighttpd configuration for the site prior to implementing the cache. It has a few basic rewrite rules so request for a (htm|html) file gets passed to the index.php. This file acts as a handler to determine the actual smarty template to load, enabling the use of friendly URLs.

$HTTP["host"] == "www.site.com" {
    server.document-root = "/var/www/site/html"
    url.rewrite = (
        "/(.*)\.(htm|html)(\?.*)??$" => "/index.php?p=$1",
        "/(.*)/(\?.*)??$" => "/index.php?p=$1",
    	"/(.*)/(.*)/$" => "/index.php?p=$1/$2"
    )
}

And after implementing the cache, this is the lighttpd host configuration:

$HTTP["host"] =~ "www.site.com" {
    server.document-root = "/var/www/site/html"
    magnet.attract-physical-path-to = ("/var/www/site/html/rewrite.lua")
}

Below is the code for rewrite.lua (referenced in the lighttpd conf above) which implements lighttpd mod_magnet to handle the rewrite rules. It checks the file system to determine if a cached file exists, and if so, it serves that file. Otherwise, it rewrites to the index.php handler so smarty can generate the new cache file.

The cache_path variable in the rewrite.lua script is my smarty cache dir ( $smarty->cache_dir ), plus the $smarty->cache_id (mine is blank, be sure to append it to the cache_path variable as a subdir of your smarty cache_dir)

At this point, lighttpd is rewriting all requests to my index.php handler since it will not find a cached copy in the the cache_path dir. I now have to work with the index.php handler file so Smarty will save the compiled HTML files into the cache_path defined in rewrite.lua. To do this, I wrote a custom cache handler function for smarty and stuck it in the index.php file. So far, my Smarty setup is looking like this:

The main thing here is that the caching is enabled, each request will recompile the template, the compile_id is blank, and the cache_dir matches what is set for cache_path in rewrite.lua.

The function server_rewrite_cache_handler() overrides the default smarty cache read/write/clear logic so that the file is saved in the correct directory structure that matches the request that was rewritten from lighttpd.

The one last thing is to disable several lines of code in the smarty/internals/core.write_cache_file.php, as by default Smarty will add some serialized data to the top of the cache data it passes to our custom cache handler function. The changes are shown below and occur around line 65 and 66 of the core.write_cache_file.php file.

    #$_cache_info = serialize($smarty->_cache_info);
    #$params['results'] = strlen($_cache_info) . "\n" . $_cache_info . $params['results'];

That is all, you should see a drastic speedup at this point.

Share/Save/Bookmark

AgileGallery OpenLaszlo Source Code

December 5th, 2008 § 1

I just released the source code for the OpenLaszlo AgileGallery flash photo gallery on github. Enjoy!

Share/Save/Bookmark

Python CSV to Fixed Sized Text Tables

December 5th, 2008 § 0

Here is a quick and simple Python class I hacked up to take comma separated values and reformat them to a fixed column text table. Supports multi-line rows, column width limits, and creates a header row automatically using the data from the first row of the CSV input.

» Read the rest of this entry «

Share/Save/Bookmark

PHP Array to Text Tables

October 31st, 2008 § 1

I needed this for a little project so I coded it up. I haven’t done a lot of tests but it works just fine for formatting the associative arrays I have run through it.

The class supports multi-line rows, limiting the width of the column, and automatically creating a heading based on the keys from the associative array.

Usage and Output Example

Text Table Formatted Output of above example:

+----------+----+---------------------+
| COMPANY  | ID |       BALANCE       |
+----------+----+---------------------+
| AIG      | 1  | -$99,999,999,999.00 |
| Wachovia | 2  | -$10,000,000.00     |
| HP       | 3  | $555,000.000.00     |
| IBM      | 4  | $12,000.00          |
+----------+----+---------------------+

Full class source code after the jump…
» Read the rest of this entry «

Share/Save/Bookmark

Accurate Web Application Benchmarking Methodology

October 1st, 2008 § 0

I recently was searching for a benchmark comparing the performance of the PDO and ADOdb Database Abstraction Libraries for PHP applicable to use in a Web application, and came up with nothing satisfactory on the subject. There were several benchmarks floating around but I noticed a problem with the methodology used.

A Flawed Methodology

  1. Create a separate script for each library to be benchmarked
  2. Within that script, create a time marker at the start and end of the script
  3. After the first time marker, include the library
  4. Between the time markers, execute X (example: X=500) iterations of a block of code which calls into the method(s) of the library

The benchmarkers then executed each script and calculated the time difference between the start/end time markers of each script to determine the winning library.

Why this Benchmarking Methodology is less Accurate

When benchmarking libraries to be included at runtime in a PHP driven web application, there will be varying overhead for the actual inclusion of the library. I would assume this applies not only to PHP, but to languages such as Python, Perl, Ruby, etc.

Thus any benchmarking methodology which fails to factor in the library inclusion cost in a realistic proportion to the calls to that library will be skewed, sometimes badly. Never in my experience have I needed to expose performance-critical code that iterates through 500 calls to a database abstraction library per a single hit. This would be a ration of 1 library inclusion per 500 method calls into that library.

A more accurate ratio of library includes to library method calls is 1 to 3. So on an average, for one hit to an application where the library is included, we call methods of that library three times.

A More Accurate Benchmarking Methodology

We are benchmarking a web application, so we need our 500 iterations on the client side, not inside a high count loop inside the application.

Each of our client side iterations will be a separate request causing our library to be loaded once, and methods of that library to be called in a realistic ratio that would simulate real application calls.

To achieve this, we use a tool such as ApacheBench on the client side and make 500 requests (example below). We still have a script for each library we wish to benchmark, but we model the method calls within that script to a more realistic number, such as three.

# Library A results
ab -n 500 http://localhost/library-a.php

# Library B results
ab -n 500 http://localhost/library-b.php

The Result

In the case of Benchmarking PDO vs ADOdb, I saw benchmarks using the flawed benchmark methodology which put PDO at only a 125% speedup over the ADOdb library.

When I benchmarked (see full benchmark here) the PDO library provide as much as a 2840% speedup over the ADOdb Library.

My conclusion - load time inclusion times in web languages makes a huge difference.

Share/Save/Bookmark