Thursday, June 14, 2018

Twitter Tracking, or why the EU Cookie Warnings are Retarded

After the EU overstepped its jurisdiction by mandating that all sites, not just European ones, place an obtrusive and annoying warning bar somewhere telling users that they use cookies and requiring it to be dismissed (or deleted via browser console, hidden via ad-blocker rule, etc.), it comes as no surprise that sites started using the other available methods of tracking to track their users against their consent.

Twitter uses a part of the JavaScript Web Storage API, called Local Storage, to do precisely this, even if you have all of the tracking preferences turned off.  You can look at it in your browser's developer console (usually opened by pressing F12, you may have to click a tab titled "Console" after that) quite easily by just typing localStorage and pressing Enter.  After quite a while of using Twitter and clicking on things like trends, random tweets, and whatnot, the page's Local Storage becomes a rather large object.  Twitter uses Local Storage for a lot of its functionality, but the tracking portion of it can grow to a truly massive size if not manually kept in check.

To more easily parse what they've got on you, you can run this code:
// re-enable the console window.console = (function(){ var a = document.createElement( 'iframe' ); a.style.display = 'none'; document.body.appendChild( a ); return Object.seal( a.contentWindow.console ); })(); // see what they've got on you Object.keys( localStorage ).filter( function( element ){ return element.startsWith( '__XHRNotes__' ); } ).forEach( function( current ){ console.log( 'localStorage["' + current + '"]: ' + localStorage[current] ); } );
Note that we have to re-enable the browser's console logging function, since Twitter helpfully disables it.  Twitter disables it repeatedly, so the code has to prevent it from being immediately re-disabled.  This is what Object.seal() does for us.

Once you've taken a look at the rather cryptic data they've got on you, you'll probably want to delete it, and that's easy enough, just run this:
// erase it all Object.keys( localStorage ).filter( function( element ){ return element.startsWith( '__XHRNotes__' ); } ).forEach( function( current ){ delete localStorage[current]; } );
Note that normally I would package this up into a nice user script, but doing so here requires a lot of work.  It needs to enable its user to see the data in a convenient manner, and allow them to clear individual pieces of it as well as nuking the whole thing from orbit.  This means inserting stuff into the page, and mimicking Twitter's page structure to make it look like the rest of the page, which is very time-consuming to implement.  It can be done, and I might even do it, but I really just wanted to get the barebones code out there.  In a pinch, you can paste the code from above that deletes the data into a user script that runs on every Twitter page load.  You may consider running it in an 'onstorage' event handler (PROTIP: window.addEventListener( 'storage', function(){/*code here*/} );), though, so that it happens any time Twitter updates its Local Storage, as opposed to only when you refresh the page.  Some entries that will be deleted by that code seem to be more benign than others, so you may run into issues.

Local Storage is basically the tracking company's wet dream: it does what cookies do (persist between sessions), but with no expiration date, and it's a lot less cumbersome to set up and manage.  The page's code is (in theory) the sole arbiter of how long stuff gets stored there.  Just like with cookies, though, the user can clear it at any time, but sites rely on it being reasonably obscure and hidden, since it's not talked about nearly as often as cookies are.  Also, just like with cookies, there are just as many good uses as there are nefarious ones.  Local Storage has a little brother called Session Storage, that gets cleared when you close your browser.  Twitter uses Session Storage as well, but the data stored there is comparatively boring.  Any good ad blocker and script blocker combination will prevent things from happening based on Twitter's usage of Session Storage.