Some observations of the GoogleBot

Recently rather than using cookies to gather data on people using some of my sites I’ve started using the newish html5 localStorage and generating a unique code on the first visit. This has a few advantages, mainly that it’s a bit more persistent than cookies, I already use localStorage to store customization data for the user client-side, it works seamlessly with PhoneGap/Cordova mobile apps and also I don’t have to worry about anything on the server side (ie setting, sending and tracking cookies). I use the roughly the following code (assuming localStorage is available for the browser, which in 99% of cases it is):

I noticed that over several of my sites the GoogleBot was generating exactly the same uuid (over multiple access IPs) but it seemed that other localStorage preferences etc were not being saved. From this it seems like the GoogleBot doesn’t support saving stuff in localStorage (not a surprise given there are probably 10k computers running the GoogleBot scraper and it’s easier for them not to share site state). However it also appears that they are using the random number generator with a fixed seed so that any random numbers generated by the site are the same over all their scraper servers.

Conclusions? Don’t expect bots (or even some clients eg incognito mode) to actually save localStorage between sessions even if they support it as an interface (the modernizr test for localStorage is as follows:

which basically tests that the interface works, not that it is persistent between sessions). Also if you want truly random output when run in a bot, it looks like you’ll have to write your own pseudo-random number generator function with some changing seed perhaps based on output. It doesn’t look like Javascript’s Math object supports a seed for the .random() function which, whilst I can understand this design means that you basically have to code your own random generator stack if you want to get truly random output for bots.

Getting WordPress posting to Twitter with hashtags

In trying to publicise some of my articles in this blog a bit wider afield I recently opened a twitter account. However, in order to reach a wider audience on twitter you need to use hashtags. Unfortunately WordPress’s excellent Jetpack extension, whilst allowing you to post to twitter and other social networks, doesn’t automatically include hashtags in your posts. There have been a few attempts to add this functionality as an extension in this thread however they are all not very well coded and don’t work properly. Here is what I am now using on this blog:

Facebook Graph API Page post changes

So about a month back it looks like facebook changed their graph API to prevent posting links to pages using the method we had always used which was simply a post to //feed with my access token with message and link parameters. Posting just a message was working fine still but when I tried to add a link in I was just getting access denied.

After spending an hour or two bashing my head against the wall I discovered that you had to first access a list of all your pages with your user access token, then from that you would figure out the page’s special access token, and only then could you post.

So the resulting (somewhat messy) perl code is like: