Archive for category Website Development
Filtering Array Elements in PHP Using an Anonymous Function
Posted by aaron in Website Development on February 9th, 2009
Using an anonymous function in PHP’s array_filter as a way to search a 2D array.
Continue reading “Filtering Array Elements in PHP Using an Anonymous Function”
Automatically linking Twitter @usernames in PHP
Posted by aaron in Website Development on October 29th, 2008
I keep seeing people writing scripts that embed their Twitter feed into their websites. The “easy” way is to use Javascript, which means you don’t need to have PHP installed on your server. Doing it this way means your tweets will not be visible to visitors with Javascript disabled.
Really, nobody has Javascript disabled in their browsers anymore. The web is pretty much inaccessible without it at this point. However there are still some very important “visitors” that crawl around the web without javascript. I’m talking about search engines. Few if any search engines will actually execute Javascript on your site when crawling it for content. This means anything you have hidden inside a <script> tag will be hidden to them. If you want your tweets to be indexed as part of your page, then you’ll need to use PHP or another server-side scripting language to embed them into your page. This also has the other advantage of making your page load faster to regular visitors as well.
The below diagram should help illustrate the benefit of using PHP to embed your tweets.

Now that you’re convinced that you want to use PHP to embed your Twitter posts, you’re going to quickly run into the problem that people’s Twitter usernames are not given as a link in the RSS feed, but just the @username text. You probably want these usernames linked back to twitter.com.
I have seen some solutions involving splitting up the tweet into individual words, and looking at each to see if it begins with an @ sign. This involves a lot of code, and generally looks something like this:

It is rendered completely unnecessary by using one line of regular expressions!
$tweet = preg_replace('/(^|[^a-z0-9_])@([a-z0-9_]+)/i', '$1<a href="http://twitter.com/$2">@$2</a>', $tweet);
This regular expression is actually pretty simple. (updated) The key part is “(^|[^a-z0-9_])@([a-z0-9_]+)”, which is a lot less scary than it first looks. The ( ) are used to capture what’s inside them so that you can access it later (by using the $1 and $2 above). The [ ] match a set of characters, which can be defined as a range or a list of characters. We’re matching numbers and letters and the underscore. Finally, the + says “one or more”. The vertical bar | is used to match either what’s on the left or what’s on the right. The caret ^ (if it’s not inside square brackets) matches the beginning of a line.
So in English, this regular expression is looking for either the start of a line or a character other than a letter or number or underscore, followed by an @ sign, followed by one or more numbers or letters or the underscore, and storing those characters in the variable $2. This string is then replaced with the HTML code you see above, where $2 is set to the username by the regular expression.
Now that you understand the regular expression above, let me further complicate things by showing you how to make text that begins with http:// into a real hyperlink.
$search = array('|(http://[^ ]+)|', '/(^|[^a-z0-9_])@([a-z0-9_]+)/i'); $replace = array('<a href="$1">$1</a>', '$1<a href="http://twitter.com/$2">@$2</a>'); $tweet = preg_replace($search, $replace, $tweet);
Trust me, it isn’t that bad really. The new regular expression is actually simpler than the first, but is looking for http:// instead of @. You may have also noticed that I switched from using // to ||. You can use any character as the bounds for the regular expression. The advantage of using | is that the bar doesn’t appear inside ever. If I used / as the bounds, then had http:// inside, I’d have to escape the forward slashes of the http. (It would look like http:\/\/, which is kind of ridiculous).
You might want to check out http://www.regular-expressions.info to learn more about regular expressions. Regular expressions are an extremely powerful tool you will want to add to your arsenal when learning PHP.
Redundant web & database servers on a budget using Virtual Private Servers
Posted by aaron in Linux, Server Software, Website Development on September 22nd, 2008
Background
First let me just say that I have been struggling with this problem for quite some time now. The problem is to provide redundancy for a website so that the website continues to run even if there is a problem with one of the servers it’s running on.
In a typical simple server setup, there is a single machine running the web and mysql servers. The machine can be either a dedicated server, or as I have been using, a VPS. I have been running my websites off of VPSs for several years now, with minimal trouble. This works most of the time, but the having a site on must one machine means a Single Point Of Failure. If something is wrong with that server, the websites are non-functional until it is fixed. The trouble I have run in to falls under a few categories:
- A problem with the physical host
- A problem at the VPS level (operating system, Apache or MySQL errors, etc)
- A problem at the network level
Problems with the physical host
Problems with the physical host do occur. With a VPS, these are completely out of your control, and are the responsibility of the hosting provider. Some examples of things I’ve encountered include a failed RAID array, a corrupted filesystem on the host, requiring a several-hour-long fsck, or an unplugged power cord. The worst issue I’ve had was when the provider said they had lost 2 drives in a RAID 5 array, and all the VPSs on that host were completely gone. Luckily I had a backup of the system and was up and running on a new VPS within a couple hours.
Problems at the application level
I haven’t actually run in to many problems at the VPS level compared to the other types of problems. However the latest issue I’m having does fall in to this category. Apache periodically starts crashing part way through serving a page with the error “[notice] child pid 21106 exit signal Segmentation fault (11)”. Visitors see a completely blank page some of the time.
Problems at the nework level
By far the most frequently occurring problem I encounter is network-related. These problems are usually out of both my and the hosting provider’s control. If there is a problem with the network, the downtime can vary greatly, anywhere from 5 minutes to 12 hours. It can be caused by a Denial of Service attack on a completely different server in the same datacenter, or it could just be a routing issue somewhere along the path from me to the server.
A typical redundant setup will cover both #1 and #2. Typical setups may include one or more load balancers in front of multiple application servers. If a machine goes down, the load balancers can stop sending requests to it. This works great if you’re trying to protect against servers failing. However if the load balancers are all on the same network, unless the network has multiple redundant paths, the whole system is still vulnerable to network issues.
My Solution
Since I most frequently encounter network issues, I can’t get away with a just a typical load-balancing solution. What I really need is a copy of the entire website in a geographically different location. Here is my solution:
One VPS in Dallas, TX (called triton), and another VPS in Newark, NJ (called proteus). (Yes, I name my servers after Greek mythology.) Triton holds the master copies of the websites’ php files, and proteus gets a copy of them via rsync. If I ever need to update the site, I edit the files on triton and then rsync them to proteus. Here is where the redundancy comes in. My DNS entries point the domain to both IP addresses. This means during normal operation, visitors will be more or less distributed between the two hosts evenly. If one server goes down, I can stop resolving DNS queries to it, and the worst that will happen is some dead pages for as long as the TTL on the domain.
This works as long as you’re just serving static content. Serving dynamic content, such as from a database, gets a little more complicated. MySQL’s NDB clusters are apparently only effective when run within a high speed network, with at least a 10 MBPS connection between them. Replication turns out to be more along the lines of what I’m looking for.
Replication to the rescue!
Replication is designed for a one-way sync between a master and slave. However, it is possible to configure two servers to be both a master and a slave. They will both notify each other of changes made to their databases. There is one trick you need to do in order to prevent primary key conflicts if rows are written to both databases while the link is down. It involves setting the auto_increment offset and increment, so that one server will only create even keys, and the other creates only odd keys.
/etc/my.cnf auto_increment_increment = 2 auto_increment_offset = 1
Here’s some dry reading on replication from the MySQL manual. Here is a slightly clearer guide to replication which sums everything up pretty nicely. Overall, replication was pretty easy to set up. It seems to be pretty robust as well. I simulated network problems by adding firewall rules to block the servers from each other. I was able to continue to interact with each database, and the changes were all carried over when the link came back up.
Feel free to comment if you have any experience or insights into configuring web and database servers! I’m curious to hear what other people have done.
Hiding text from non-registered users in MediaWiki
Posted by aaron in Website Development on September 2nd, 2008
In a semi-private wiki I maintain, we are using whitelisting to make some pages visible to the public but not others. Authentication is done off of an IMAP server, and only registered users have full access tothe wiki. We still run in to the occasional page where most info should be public, but only certain bits should not appear to the public. Things like server passwords, or phone numbers should be hidden except to registered users. I created this plugin so that we can hide bits and pieces of pages from non-registered users, while making the rest of the page public. Just add a <private> tag around the text you want hidden, and you’re good to go.
Copy this code to a “extensions/PrivateBlocks.php”, and add the following line to LocalSettings.php:
require_once( 'extensions/PrivateBlocks.php' );
Continue reading “Hiding text from non-registered users in MediaWiki”
Sorting a query in MySQL ignoring the word “The”
Posted by aaron in Website Development on April 19th, 2008
When you have a database of books or movies, some of the titles begin with “The.” If you do a regular ORDER BY on the table, all the titles that start with “the” get clumped together. One option is running an unsorted query and sorting in PHP, but it would be better to sort at the database level. Here is a query I came up with to do that, using an IF function in MySQL!
SELECT * FROM movies ORDER BY IF(SUBSTRING(title,1,4)="The ",SUBSTRING(title,5),name)
Now create a custom function for that, and it’s even easier to use!
CREATE FUNCTION SORTNAME (name VARCHAR(255)) RETURNS VARCHAR(255) RETURN IF( (LCASE(SUBSTRING(name,1,4)) = 'the '), SUBSTRING(name,5), name );
Now you can use it in a query like this:
SELECT * FROM movies ORDER BY SORTNAME(title)
The Ultimate Web Developer Setup
Posted by aaron in Apple/os x, Website Development on January 10th, 2008
Using VMware Fusion and Multiple IEs, I set up my iMac to run the four major browsers simultaneously! IE 7, IE6, Firefox and Safari. With a 1920×1200 display, I can get all four to fit on the screen at once.

Click the image for the full-size version.
I no longer need to keep an extra Windows machine around here for debugging websites! “Multiple IEs” also can install IE 5.5, 5.1, 4 and 3, although I’m pretty sure those are almost completely irrelevant at this point.
One problem I did run into was that VMware in “Unity” mode didn’t seem to play well with Spaces. For example, sometimes after switching spaces a Windows app would lose focus and I wouldn’t be able to click it anymore. Switching to “Single Window” mode and then back again seemed to fix it.
Useful software:
Stating the obvious: IE is a pain in the butt
Posted by aaron in Troubleshooting, Website Development on August 27th, 2007
Thank you Brandon K! You just saved me 6 hours apparently:
Brandon K [ brandonkirsch uses gmail ]
26-Apr-2007 01:04
I just lost six hours of my life trying to use the following method to send a PDF file via PHP to Internet Explorer 6:header('Content-type: application/pdf'); header('Content-Disposition: attachment; filename="downloaded.pdf"'); readfile('original.pdf');When using SSL, Internet Explorer will prompt with the Open / Save dialog, but then says “The file is currently unavailable or cannot be found. Please try again later.” After much searching I became aware of the following MSKB Article titled “Internet Explorer file downloads over SSL do not work with the cache control headers” (KBID: 323308)
PHP.INI by default uses a setting: session.cache_limiter = nocache which modifies Content-Cache and Pragma headers to include “nocache” options. You can eliminate the IE error by changing “nocache” to “public” or “private” in PHP.INI — This will change the Content-Cache header as well as completely remove the Pragma header. If you cannot or do not want to modify PHP.INI for a site-wide fix, you can send the following two headers to overwrite defaults:
header('Cache-Control: maxage=3600'); //Adjust maxage appropriately header('Pragma: public');You will still need to set the content headers as listed above for this to work. Please note this problem ONLY effects Internet Explorer, while Firefox does not exhibit this flawed behavior.
Why can’t IE just play nice?

