Marcos Placona Blog

Programming, technology and the taming of the web.

Category: Technology (page 1 of 7)

All of the general techie / technology talk that will not fit on any of the other specific categories

Varnish with Apache and WordPress on Centos

Reading time: 12 – 20 minutes

Varnish Cache

Varnish is wicked! It works on your webserver as a reverse proxy to cache HTTP requests. According to their website:

Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 – 1000x, depending on your architecture

I have only just installed it on this very website, and have already seen an improvement of about 500 times without actually having to do much. I spent about 2 hours to configure it all, but could probably attribute 1 hour to this to dumbness on my side while trying to get it all configured.

I have now come up with a configuration that works perfectly (so far) on this website, and have used a mix and match of resources such as this and this. They are both great, but I found that they didn’t particularly cater for what I was looking for as I have very particular needs.

In my VPS, I have a few domains running, but only really wanted to have this website cached, since the other domains either get their content updated too often, or get too few hits to actually justify caching.

I also didn’t want to cache any of my sub-domains, as most of them are actually running on the cloud and being proxied by Apache via mod_rewrite. It turns out those didn’t really wanna play when the HTTP requests were cached, and because they are mostly dynamic applications, I didn’t think it was worth spending time and energy configuring them to get the cache purged.

Installing Varnish

Start by making sure you have all the necessary stuff to install it. You will have to do this in your terminal either by logging in to your server or SSH’ing to it.

sudo yum install gcc make automake autoconf libtool ncurses-devel libxslt groff pcre-devel pckgconfig libedit libedit-devel

We now install Varnish. At the moment, the latest stable version of Varnish is 3.0.4, and I’m currently only interested in stable versions, but because I want this post to be timeless, I will give you the Installation on RedHat link which will always give you latest version.

Grab the link that corresponds to your Centos version (5 or 6) and paste it in your terminal

rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el6/noarch/varnish-release/varnish-release-3.0-1.el6.noarch.rpm

Now all that is left to install Varnish is to run

sudo yum install varnish

We want Varnish to run every time we restart our server, and we want it to run automatically, so let’s add it in

chkconfig --level 345 varnish on

Being able to listen to connections

You are probably running your website through port 80 (which is the most often used port by HTTP). that is fine, and you obviously already have it running fine. But because we will be running Varnish before our webserver, we will also need to use another port, which means we need to make sure that port also accepts TCP connections. We will be using port 8080 here, which is fine if you’re not running tomcat (it normally defaults to this port), but you can use any other port you want really, as long as it’s not already in use by anything else. We will end up with the following architecture:

Varnish + Apache

To be able to use this port, we need to make sure our firewall actually allows that port to receive HTTP connections. luckily I have already written an article about this, so give it a read to understand a little better why we’re doing this. a tl;dr version of it would be as such:

sudo vim /etc/sysconfig/iptables

Find where port 80 is being opened, and add a new line under it with the following:

-A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT

Restart iptables

sudo service iptables restart

Configuring Varnish

Once you have installed Varnish, it will create a file under /etc/sysconfig/varnish. This file contains 4 alternatives of pre-configured settings for Varnish. It’s very good to get you going, but you will probably find there will be things you’re going to want to change. I have used “Alternative 2″ as I found it to be the one that suits me the most. Feel free to read through all the other alternatives though, and choose the one that takes your fancy. Make sure you comment the other alternatives out so you end up with only one.

One thing you should definitely do, is set how much memory you will allow Varnish to take up. Depending on how much memory you have on your server, you will want to configure it accordingly.

Varnish Options

I have configured mine as follows:

DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-u varnish -g varnish \
-S /etc/varnish/secret \
-s file,/var/lib/varnish/varnish_storage.bin,256m"

On the first line, I’m telling Varnish to listen to port 80, and on the last line I’m specifying how much memory I want Varnish to take up. 256mb is quite a lot to be honest, but you can allow more in case you have some to spare.

The VCL Configuration

Put it this way… you configure your websites here, so you will want to pay some attention to this file. Getting it wrong will give you a lot of grief, and likely to take your website down for a few minutes until you get whatever you got wrong right. The configuration you will see here is my suggested configuration, and again, is an amalgam of a few configurations I found plus a few extra things I wanted to add myself.

backend default {
.host = "127.0.0.1";
.port = "8080";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;
}
sub vcl_recv {
# all domains in here will return a "pass" which means they won't be cached
if (req.http.host ~ "(www\.)?(site1.com|site2.co.uk|site3.me)") {
return (pass);
}
# all sub-domains listed here will also return a pass, so no caching either
else if(req.http.host ~ "(ads|cfaday|cffunctionaday|cftagaday|wallpapers|examples|coinconverter|langithub|top40)(\.placona.co.uk)"){
return (pass);
}
# now this is cached
else if(req.http.host == "placona.co.uk"){
set req.backend = default;
}
else {
set req.backend = default;
}
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
if( req.url ~ "^/wp-(login|admin)" || req.http.Cookie ~ "wordpress_logged_in_" ){
return (pass);
}
if (req.request == "PURGE") {
return (lookup);
}
if (req.url ~ "^/phpmyadmin") {
return (pass);
}
if( req.url ~ "\?s=" ){
return (pass);
}
if ( req.request == "POST" || req.http.Authorization ) {
return (pass);
}
unset req.http.Cookie;
return (lookup);
}
#
# accept purges from w3tc and varnish http purge
sub vcl_hit {
if (req.request == "PURGE") { purge; }
return (deliver);
}
#
# accept purges from w3tc and varnish http purge
sub vcl_miss {
if (req.request == "PURGE") { purge; }
return (fetch);
}
#
#
sub vcl_fetch {
# allow phpmyadmin
if (req.url ~ "^/phpmyadmin") {
return (hit_for_pass);
}
#
# remove some headers we never want to see
unset beresp.http.Server;
unset beresp.http.X-Powered-By;
#
# only allow cookies to be set if we're in admin area - i.e. commenters stay logged out
if( beresp.http.Set-Cookie && req.url !~ "^/wp-(login|admin)" ){
unset beresp.http.Set-Cookie;
}
#
# don't cache response to posted requests or those with basic auth
if ( req.request == "POST" || req.http.Authorization ) {
return (hit_for_pass);
}
#
# only cache status ok
if ( beresp.status != 200 ) {
return (hit_for_pass);
}
#
# don't cache search results
if( req.url ~ "\?s=" ){
return (hit_for_pass);
}
#
# else ok to cache the response
set beresp.ttl = 24h;
return (deliver);
}
#
#
sub vcl_deliver {
# add debugging headers, so we can see what's cached
if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
}
else {
set resp.http.X-Cache = "MISS";
}
# remove some headers added by varnish
unset resp.http.Via;
unset resp.http.X-Varnish;
remove resp.http.Age;
remove resp.http.X-Powered-By;
remove resp.http.X-CF-Powered-By;
}
#
sub vcl_hash {
hash_data( req.url );
# ensure separate cache for mobile clients (WPTouch workaround)
if( req.http.User-Agent ~ "(iPod|iPhone|incognito|webmate|dream|CUPCAKE|WebOS|blackberry9\d\d\d)" ){
hash_data("touch");
}
return (hash);
}

A few important things here happen in the very beginning of the file. I will repeat it down here to be able to explain it better:

backend default {
.host = "127.0.0.1";
.port = "8080";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;
}

We are telling Varnish our host IP address is localhost, so it should proxy requests to it, and am also telling it to proxy them through port 8080. If you have chosen a different port, this is where you should change it to that port.

On sub vcl_recv we then specify which domains we don’t want to cache by returning “pass”. This according to the documentation defines:

When you return pass the request and subsequent response will be passed to and from the backend server. It won’t be cached. pass can be returned from vcl_recv

# all domains in here will return a "pass" which means they won't be cached
if (req.http.host ~ "(www\.)?(site1.com|site2.co.uk|site3.me)") {
return (pass);
}
# all sub-domains listed here will also return a pass, so no caching either
else if(req.http.host ~ "(ads|cfaday|cffunctionaday|cftagaday|wallpapers|examples|coinconverter|langithub|top40)(\.placona.co.uk)"){
return (pass);
}

So we are skipping Varnish here and doing our own thing, which in this case is exactly what we want to do.

We then move on by telling Varnish what we actually want to cache

# now this is cached
else if(req.http.host == "placona.co.uk"){
set req.backend = default;
}

Bonus Varnish trick

Great, if you have made all your changes, but somehow messed up, Varnish is likely to not even start, but if it does, it will cause you a lot of heartache, so what you should do, is check that your configuration is correct. you can do so by running:

varnishd -C -f /etc/varnish/default.vcl

And if everything is OK, you should see that your configuration got compiled correctly and nothing like “Running VCC-compiler failed, exit 1” was returned. If it does, you will then need to go back to your file and edit it. The compiler is pretty OK though and will tell you in which line the error happened.

Configuring Apache

If you’re still with me, you’re just a few moments away from being pretty pleased with your new setup, but bear with me for another moment as we still need to make a couple of changes in Apache.

Remember back in the last image where we discussed Apache would now listen to port 8080? Well there we go, it’s time to change this.

Let’s open our Apache configuration file by running the following:

sudo vim /etc/httpd/conf/httpd.conf

We will then change

Listen 80
NameVirtualHost *:80

To be

Listen 8080
NameVirtualHost *:8080

This is now effectively telling Apache to listen to port 8080, which in our case is the port Varnish will be communicating with.

Configuring Virtual Hosts

This step can be considered optional, as not everyone uses virtual hosts. I use quite a few of them, so in my case, I had to go to each and every virtual host and also modify them to listen to port 8080 as opposed to 80. You would do this as follows:

<VirtualHost *:8080>
DocumentRoot /var/www/awesome
ServerName awesome.placona.co.uk
ServerAlias awesome.placona.co.uk
</VirtualHost>

Bonus Apache trick

Wanna check your Apache configurations are all working before you go on restarting it? Just run:

/usr/sbin/httpd -t

If you get “Syntax OK” you’re laughing!

Guess what?

Your website is just about to be faster than about 70% of the web. which is absolutely incredible if you consider the amount of time we spent together getting this done.

But we need to turn it on…

sudo service varnish start
sudo service httpd restart

 Bonus browser trick

Check that your content is really being cached by opening an incognito window (or private browsing if you’re in Firefox) and press the F12. this should show your developer toolbar (or firebug).

Now click on the Network tab (or Net tab) and expand the first GET request (this should be the same as the URL of your website). Also pay attention to how long this item took to complete.

Look in request headers, and if this is the first request to that page you should see something like:

HTTP/1.1 200 OK
Vary: Accept-Encoding,Cookie
X-Pingback: http://www.placona.co.uk/xmlrpc.php
Content-Encoding: gzip
Expires: Thu, 15 Apr 2015 20:00:00 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 14529
Accept-Ranges: bytes
Date: Wed, 23 Apr 2014 08:17:33 GMT
Connection: keep-alive
X-Cache: MISS

If you’re seeing this, it means your cache is actually working, but because you’re the first one to hit this page, you’ve got a cache miss, which basically means Varnish still didn’t have that entry cached.

If you refresh the page, you should see two things here:

  1. Your page load is now much faster
  2. X-Cache now says HIT

Because you’re now caching your request, your page loads will be much faster in general. Obviously this won’t account for page loads on external resources, but you can use other methods to cache those resources (see my CDN post here).

Every subsequent request you make to that page will also come from cache. The more hits you get in different pages, the best results your users will get. Spiders also work in your favour here as by hitting pages, they are also warming up your cache for you.

But… IP’s…

That’s right, you have now noticed every new comment you get on your blog is coming from 127.0.0.1. Remember when we changed Apache to only listen to Varnish? We have also made every single request internal, which means everything now comes from localhost. There is a very simple plugin to correct this, which will resolve this by using the variable X-Forwarded-For to get the correct user’s IP. You can check it out here.

Final thoughts

I am super happy with Varnish installed on my server, I have already seen a real benefit on my server’s performance, and the CPU is pretty much always running at about 25%, which is really good considering my traffic. Memory is always running slightly above it would have been before, but that is mainly because I’m now effectively using it, instead of just throwing stuff at it and leaving it to be disposed.

I have also installed a plugin called Varnish HTTP Purge, which manages my cache, and clears it every time I post a new entry. it also allows me to purge my cache manually, or control some other things such as which kind of requests to cache. you can find it here.

Make sure you benchmark your requests and report back on how much of improvement Varnish has made into your webserver.

A first look at Dart

Reading time: 7 – 11 minutes

A few weeks ago, I went to a Google sponsored event called Dart Flight School. The aim is to promote the language by doing a road trip and presenting use-cases and samples. The presentations were brief, and mainly focused on discussing the language’s functionalities, and its seamless integration with AngularJS (also maintained by Google)

I had a chance to look at Dart before, and was interested in finding out more things about it. Turns out the language (and platform) are pretty slick, and the development tool-set is pretty complete. The IDE is pretty good (and free), and their package management system is pretty similar to NPM’s in NodeJS.

Dart is also very (very, very) fast, and is in fact faster than V8 (VM used by NodeJS). A performance comparison can be seen below.

Dart vs V8 Engine

Data gathered from https://www.dartlang.org/performance/

It comes bundled with an out of the box converter to JS, which means you can write your entire application in Dart, and then convert it to JavaScript. I must say I was initially sceptical about this conversion, but upon looking on the performance page, it seems even after conversion, JS generated by dart still manages to be faster than V8 (purple line above). The JS converter is a Dart application written in Dart, which means you could then convert it to a JS application using itself…. bewildering eh?

Language

From a language point of view, dart seemed to be very readable, and compliant with contemporary languages such as Java and C#. It is a class based language that allows you to fully use object orientation, and has some very nifty functionalities embedded in it, where you can for example define a method using shorthand syntax as such:

Data Structures

Its core library also provides you with Lists, Sets and Maps, which basically means no imports, as you get that straight out-of-the-box. So for example if you wanted to create a Map and iterate through it, you could simply do:

Unit Tests

A big part of writing great code lies with the ability to provide unit tests that will make sure your code remains awesome even after refactoring. With dart, you can as easily create unit tests by importing unittest

And obviously I could have just as easily grouped my tests in a single group to have them maybe organized in smaller units. I’ve also used shorthand syntax to define my tests here

Reflection

Is supported by a library called Mirrors (enough said?). Though I feel I haven’t played with that enough to give you any better example than this.

Futures

According to Dart’s own website, a Future represents a means for getting a value sometime in the future. When a function that returns a Future is invoked, two things happen:

  1. The function queues up work to be done and returns an uncompleted Future object immediately.
  2. Later, when a value is available, the Future object completes with that value (or with an error).

Why would I use Futures instead of simply calling my expensive processes and waiting for them to complete?

I like to think that the people who read my blog know better than asking the question above, as it makes me feel fuzzy and warm. However if you thought of asking this question but were ashamed to actually do so, I will take you through it, and we will pretend this never happened.

It turns out, that if you do that, you will lock the thread until your application becomes responsive again, which can range from a few milliseconds to God forbid a few seconds. Meaning your users will stay put (or most likely leave) until you finish processing their request and show them some meaningful content. Think of it as “waiting ’till Friday is upon us”.

Seth Ladd gives a great example of the power a Future can have in your application

You can read more here.

Generics

Need I say anything? Need I?

Libraries

In this day and age, you want to be able to work with a language that offers you integrated package management. I have worked with numerous languages in the past, and managing third party packages has always been the pain of my life. The first time I looked at Ruby, I immediately fell in love with its package management system. Granted some languages try to accommodate for this by adding capabilities to builders such as Gradle, Ant or Maven. But I digress….

Dart comes with a package manager called Pub, which means all you need is a yaml file inside your project where you can specify any libraries your project needs, as well as which version you would like to be locked to. that way, you only need to package your application with what it really needs, and all the external libraries will be downloaded on the time you deploy your project. This makes your application lean and easy to maintain.

A pubspec.yaml file would look something like:

Then run the following from terminal to download all dependencies:

Cool Factor

Dart is a cool language and very simple to pick up, but as with all the things in life (although you don’t always like to admit it), the known is always a lot simpler, and Dart strives to offer simplicity, which means if you have done any proper language in the past, you’ll be able to read and write Dart code with ease. Dart themselves state

“We did throw in some nice syntactic features such as this. constructor args and => for one-line functions, but we’d agree that Dart chooses familiarity over excitement”

Final Verdict

It’s very exciting to see such fresh language being supported and built by Google. In my opinion, the language offers everything the “cool kids platforms” offer and more. It also has capabilities that allow you to run Dart on the server side, client side and even natively on the browser. According to their documentation, the engineers behind Dart have an Android and Google App Engine integration on the back of their heads, and even though they say it’s not completely down to them, they mention on their FAQ’s you’d need to ask the team. But I’d say the fact they have thought of it is already half the battle won.

From a language perspective, I found nothing that would put me off writing code on it (and I’m pretty fussy about semantics). Instead I have found I genuinely enjoyed writing code in Dart, and was left with a nice after-taste after attending Dart Flight School. Even though I understand we had a much cut-down version of the even here in the UK.

From Here

New TeamCity agents the right way

Reading time: 2 – 2 minutes

TeamCity Logo

At work, I’m gradually moving our CI server from Hudson to TeamCity.

Nothing against Hudson really, but I feel that TeamCity is a much more robust CI Server when it comes to integrating with .Net. It allows you to publish artifacts from your builds, and has a killer integration between developers IDE’s and itself, which is amazingly helpful to help developrs make sure they’re not going to break the build… well before they break it.

But anyway, one thing that was slightly annoying me with TeamCity, is the fact that the build agents would often get disconnect, and all my builds would stay in a queue until I went and manually restarted the agents.

The “Build Agent Disconnected” quickly became very annoying, and by quickly looking up on Google, I found lots of people had the same issue, and while there were lots of responses or people claiming they found a solution to it, I never actually found anything of much use other than the screenshot this guy posted.

When you add build agents on TeamCity, you get the option of adding them as a windows service, or simply as an agent that runs with TeamCity. I had tried to add multiple build agents as windows services before, but for a very strange reason, I would always end up with only one agent no matter what I did. TeamCity’s documentation wasn’t much help to be honest, and I ended up figuring this out after a couple of hours of trial and error. So here’s how you do it properly.

Continue reading

Easy unsubscribe with GoUnsubscribe.me

Reading time: 2 – 3 minutes

I’ve created a new website over the last weekend and would like to share it here with you.

I’ve spent an entire day without checking my emails on Saturday, and when I finally got around to do it, there was a ton of useless crap in it. My spam filters are pretty good, but won’t pickup on things I (un)intentionally subscribed for.

The very act of purchasing something online will sometimes auto-subscribe you to the store’s monthly, weekly and daily newsletter until the end of era.

By osmosis, I will normally delete all the email from my inbox and just get on with my life.

This time though, I decided I’d also unsubscribe from some of them. Some were easy enough, as every good citizen knows to add an unsubscribe link to the bottom of the email.

Some others though, will make the assumption you love to get their newsletters about their huge selection of garden hoses.

While unsubscribing, I though maybe some other people may want to make use of the unsubscription links, so I decided to collate them all, and put them together on a collaborative website.

The idea is very simple, find what you would like to unsubscribe from, and the link will take you straight to their unsubscription page.

What if my link isn’t there?

You can just add it. Simply go ahead to the GitHub page, fork the project, make your changes, and send me a pull request.

Adding a new URL is as simple as:

Adding a new URL to GoUnsubscribe.com

And you can also collaborate with the project itself, but changing any of the files, adding new functionalities or making the layout look a bit prettier.

So check GoUnsubscribe.me out!

And fork me on GitHub

UK Top 40 albums & singles JSON

Reading time: 1 – 2 minutes

top-40-albuns-singles

So I had this idea for a little application and wanted to get the UK’s Top40 singles to use in it. I started by writing something that would scrape Radio 1′s Top 40 chart and return me a list of songs since I couldn’t find any feeds that would give me that.

I then thought this could be of use to somebody else, so turned it into a little service (built using Ruby and Sinatra) that returns a JSON object with all the singles and some useful information about the number of weeks it’s been there, how much has it moved, and which direction (up or down) it’s gone.

On the root of it, it returns the chart date as the current date and time, and the retrieval date to indicate which date it’s been last retrieved. I am caching the feed to play nice with Radio 1, so I’m only making one HTTP request a day to their website.

Check it out at http://top40.placona.co.uk/

Also feel free to fork it, and collaborate by adding your country’s top 40

Older posts