Dropping the BOM! or: Who's the BOM King?

01-Jul-2012

OK, advanced warning - things are about to get a little geeky...

I've been working lately on a PHP project - I've written my fair share of PHP in the past, but for many years I've been lucky enough to develop almost exclusively in C#. So my (temporary) return to PHP has nudged me out of my comfort zone (no bad thing), and I'm encountering lots of new and interesting problems to solve. One particular problem had me stumped for a while, so I thought I'd share the solution in case it helps anyone else.

Quick description of the problem - when including external files (using include, include_once, require, etc), I was getting a bunch of extraneous characters spat out to the page:

 



 

The screenshot above is a bit difficult to see, but it's basically a repeated sequence of .

These little tinkers are Byte Order Marks, or BOMs, which get added to the start of Unicode text files by certain text editors (learn more about BOMs here). I've been developing my PHP code on a Windows box, using the NetBeans IDE, and running the site and MySQL database locally using XAMPP. Things are fine on my dev box, but the problem occurs once the files get FTPd to the live (Linux) server. Those pesky BOMs are creeping in somewhere! This can be avoided by modifying the NetBeans project settings, but since I edit my PHP files using a variety of editors, the BOMs sneak back in. And PHP doesn't like that (depending on how it's configured, hence my dev/live discrepancy).

You can get rid of these troublesome BOMs using a suitable text editor - Notepad++ does the trick (Encoding > Convert to UTF-8 without BOM). However, doing this one file at a time is somewhat painstaking, so I wrote a quick little tool to batch convert all files prior to deploying to the live server. Here's the useful bit (with error checking removed for readability):

static void Main(string[] args)
{
try
{
if(args.Length < 1)
throw new ArgumentException("Must pass directory to be processed as command line argument.");

DirectoryInfo directoryInfo = new DirectoryInfo(args[0]);

foreach(FileInfo file in directoryInfo.GetFiles("*.php", SearchOption.AllDirectories))
{
string content = string.Empty;

using(StreamReader reader = new StreamReader(file.FullName))
{
content = reader.ReadToEnd();
reader.Close();
}

UTF8Encoding utf8WithoutBom = new System.Text.UTF8Encoding(false);

using(StreamWriter writer = new StreamWriter(file.FullName, false, utf8WithoutBom))
{
writer.Write(content);
writer.Close();
}
}
}
catch(Exception ex)
{
HandleException(ex);
}
}

Hope that helps you become your own BOM King! And remember, as with any other code you find on the internet, USE AT YOUR OWN RISK!