While talking about high performance web applications, the question of usability vs speed vs security vs scalability always arises. As is usually seen, what is fast is not scalable and may be more vulnerable to the numerous security threats. Still, it is possible to design secure, scalable web applications without giving up on the performance front. We will outline a few methods to improve site performance without affecting security and scalability.
DB optimizations
Usually, the single factor contributing heavily to a poorly performing site is the Database. Also, normally, the culprit is not the Database itself, but the developer and his SQL queries.
Query optimization
Poorly written queries are the leading factor that contributes to performance deterioration. The first thing to check is if adding index(es) to one or more columns can give a better result. MySQL EXPLAIN and MySQL ANALYSE TABLE are great tools that can help us identify and solve performance bottle-necks.
If properly written, indexes can be used to satisfy an ORDER BY clause without doing any extra sorting. Likewise, if poorly written, MySQL may have to store the selected rows in a temporary table and require an extra sorting pass, which are very expensive. All these can be identified by running the EXPLAIN command on the query.
GROUP BY clauses are usually expensive. This is because MySQL generally satisfies a GROUP BY by scanning the whole table and creating a new temporary table where all rows from each group are consecutive. Then MySQL uses this temporary table to discover groups and apply aggregate functions(if any) on them. If properly written and indexed, MySQL is able to avoid this overhead by using index access, and thus avoiding the creation of temporary tables and other overheads. From MySQL manual
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order. Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
The MySQL implementation of DISTINCT follows that it can be considered as a special case of GROUP BY. Hence, all the optimizations that are applicable for GROUP BY can also be applied to the DISTINCT also.
Using persistent connections to MySQL helps. But then, be very careful that it does not cause any undesired behaviour. Refer PHP manual pages here and here.
MySQL manual has a section dedicated to optimization.
Avoid SELECT * FROM .. queries.
Select only those columns which you actually intent to use. Selecting all the columns from a table, to only use one, is a huge waste of Server, Database and Processing powers. So always select only those rows that you actually need.
Avoid running SQL queries in loop.
Donot run SQL queries in loop. This results in slower scripts. Try to combine the queries to a single query. This is possible only for queries of the type INSERT INTO … and not for SELECT queries.
Avoid sub-queries, use joins
As suggested by MySQL manual, always try to use joins in place of inner queries. Usually, for doing the same stuff, inner-queries take much more time to execute than their join counterparts.
PHP: Use MySQLi
If you are using PHP, using Mysqli instead of MySQL extension can help a lot in performance. Further, Mysqli is the extension PHP strongly recommends for MySQL server versions 4.1.3 and later. Not only does this improve performance, there are a number of great features that favour the use of this extension like Object oriented interface, support for Prepared statements, support for Multiple statements, support for Transactions, enhanced debugging capabilities, etc. Refer PHP Mysqli overview for more details.
Code Optimization
Code optimization is not an after-thought. It has to be done from the very first stage in the development process. Performance and security are two things that the software development process must evolve around. It may not be possible or it may be really difficult to optimize software after the development is complete. Still there are areas that can be optimized, even after the code is in production, that can significanty improve performance.
Loop variables
Using functions like count() in loop structures cause it to be evaluated for every iterations the loop. For eg:
[php]
for ($i = 0; $i < count($array); $i++)
{
// loop body
}
[/php]
cause the count($array) to be evaluated for every iteration of the loop. Instead use
[php]
$count = count($array);
for ($i = 0; $i < $count; $i++)
{
// loop body
}
[/php]
Now the count() function will be evaluated only once.
Initialize and unset variables
PHP, JavaScript, etc are weakly typed languages. Also it is not necessary to initialize a variable before using it. This is a nice and convenient feature, especially for beginners. But operations on uninitialized variables are way expensive than those on an initialized variables. Also remember to unset those variables that you are done with. This helps in freeing memory and thus improves performance. Unsetting variables is especially important in case of large arrays, that may be taking a lot of memory.
Further, many programmers have the habit of creating duplicate variables just for convenience. For eg. there is no point in an assignment like
[php]$name = $_POST[‘name’];[/php]
Instead, use the original variables. The downside is that this may reduce code readability. Providing comments can help avoid this.
Use built-in functions
Don’t reinvent the wheel. Proper understanding and awareness of the built-in language constructs can help programmers save precious time, effort and money. Many people got the habit of writing the code themselves when one or more language constructs are already available for doing the job. This is especially true in case of beginners.
Always try to use simpler, less expensive functions that do the same job. For example, donot use regular expressions like ereg_replace or preg_replace, when you can do the same operation with simpler string functions like str_replace or substr_replace. Regular expressions are invaluable when we need complex search and replace rules, but don’t use them just because you can, when a simpler, less expensive string function is available.
Phpbench have some interesting benchmark giving an example of how simple changes can create huge performance differences while accomplishing the same task.
Optimize images. It is seen that converting GIFs to PNGs can save space. Also there are many useful tools out there for analysing and optimizing images. Examples are imagemagick and pngcrush. Also, donot resize images in HTML. Use proper sized images, instead of using a big image and resizing it to fit the space. Also avoid CSS expressions for similar stuff.
Minify JavaScript and CSS
Minifying JavScript and CSS reduces the size of the files. Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. During minification, all the comments, unwanted whitespace characters, etc are removed, thus shrinking the code-size. JSMin, Closure Compiler and YUICompressor are some of the popular tools for minification. Google’s Closure Compiler runs in the browser and is easy to use. It also have an API.
The downside of minification is that it renders the code unreadable. So it will be difficult, if not impossible, to make any changes to the script/style element. Minification during development phase may not be very viable, as the code may need to be changed often. But it can be done after the development is over and the code-bases are fairly stable.
Reduce no. of components
Reduce the number of HTTP requests made to serve the web-page. The smaller the number of components, the lesser the page will take to load. Try combining scripts to consolidate multiple requests to one single requests. Also see if you can use CSS sprites to combine images. Also, reducing the number of DOM elements helps. This results in less complex pages that can be more easily rendered by the browsers.
Add proper headers
Add a far future Expires header so that browsers will cache static contents like images, scripts, stylesheets, etc. Add proper Expires header for dynamic contents. If you are not behind a load-balancing setup, configure ETags properly. ETags help browers in cache-validation thus preventing the serving of out-of-date content from the cache. Sending proper headers can be done with the Apache, instead of doing it in code. We can configure Apache to set an expiry date based on content-type, ie it is possible to set an expiry date of 1year for all images and other resources that do not expire.
Put Scripts at botttom and stylesheets at top
The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames. Moving the scripts to the bottom maynot be always possible. Move as much scripts to the bottom as possible.
Similarly, moving all the stylesheets may help in improvising page-load times. The reason for this is that, when stylesheets are at the top, the page can be rendered progressively. Loading stylesheets in the HEAD section of the page also is the HTML specification.
It also helps to avoid loading duplicate scripts, ie when the same script is included twice or more in the same page. This avoids the unnecessary HTTP requests and the time delays associated with loading the same contents twice. Different browsers behave differently on encountering duplicate scripts.
G-zip components
All the major players in the browser market support Gzip compression. Content compression upto 70% can be acheived by using Gzip. Apache servers use mod_deflate and mod_gzip for enabling Gzip. Servers can be configured to choose what contents to Gzip. It will be best to Gzip HTML content, stylesheets and any other text contents including XML and JSON. Images are already compressed, so it won’t help to try Gzipping images.
Server setup/configurations
Using CDNs to deliver contents can help in bringing down the page load times. Having multiple servers in multiple locations help serve the page faster as contents are delivered from the server closest to the end-user.
Always avoid 404s. These waste resources and are expensive. Showing a helpful 404 page for the user makes sense in most cases, but showing the same for a script or stylesheet wastes server’s resources and may block parallel downloads. Further, the browser may parse the 404 response body as if it were code.
Use separate servers to serve static content. This is because of the HTTP specification that browsers must not download more than two components from the same domain. So if we move the static contents to a separate page, browsers can maximise parallel downloads. Thus it helps to split components across multiple domains.
Reduce DNS lookups.
Pre-loading and post-loading components
Initially load only those components that are necessary for the page. The rest of the components can be loaded later. For this differential loading, JavaScript can be used. There are also a number of tools like YUI Image Loader, pngcrush, etc that can help.
Pre-loading means, loading contents in anticipation. Means, while the browser is idle, we can load scripts/images in advance that may be used in the next page, so that when the browser actually load the page, the contents are already available in the browser-cache.
Profile, profile, profile..
Further, use proper profiling and benchmarking tools to measure the performance of your site. Websites like pingdom and tools like strace, Callgrind, Xdebug, xhprof, etc and even firebug can provide valuable information. Also there are tons of awesome browser extensions and tools that helps make the web-developers’ life easier.
Happy coding 🙂