The Technical SEO Renaissance: The Whys and Hows of SEO’s Forgotten Role in the Mechanics of the Web

October 25, 2016
seounity

Web technologies and their adoption are advancing at a frenetic pace. Content is a game that every type of team and agency plays, so we’re all competing for a piece of that pie. Meanwhile, technical SEO is more complicated and more important than ever before and much of the SEO discussion has shied away from its growing technical components in favor of content marketing.

As a result, SEO is going through a renaissance wherein the technical components are coming back to the forefront and we need to be prepared. At the same time, a number of thought leaders have made statements that modern SEO is not technical. These statements misrepresent the opportunities and problems that have sprouted on the backs of newer technologies. They also contribute to an ever-growing technical knowledge gap within SEO as a marketing field and make it difficult for many SEOs to solve our new problems.

That resulting knowledge gap that’s been growing for the past couple of years influenced me to, for the first time, “tour” a presentation. I’d been giving my Technical SEO Renaissance talk in one form or another since January because I thought it was important to stoke a conversation around the fact that things have shifted and many organizations and websites may be behind the curve if they don’t account for these shifts. A number of things have happened that prove I’ve been on the right track since I began giving this presentation, so I figured it’s worth bringing the discussion to continue the discussion. Shall we?

An abridged history of SEO (according to me)

It’s interesting to think that the technical SEO has become a dying breed in recent years. There was a time when it was a prerequisite.

Image via PCMag

Personally, I started working on the web in 1995 as a high school intern at Microsoft. My title, like everyone else who worked on the web then, was “webmaster.” This was well before the web profession splintered into myriad disciplines. There was no Front End vs. Backend. There was no DevOps or UX person. You were just a Webmaster.

Back then, before Yahoo, AltaVista, Lycos, Excite, and WebCrawler entered their heyday, we discovered the web by clicking linkrolls, using Gopher, Usenet, IRC, from magazines, and via email. Around the same time, IE and Netscape were engaged in the Browser Wars and you had more than one client-side scripting language to choose from. Frames were the rage.

Then the search engines showed up. Truthfully, at this time, I didn’t really think about how search engines worked. I just knew Lycos gave me what I believed to be the most trustworthy results to my queries. At that point, I had no idea that there was this underworld of people manipulating these portals into doing their bidding.

Enter SEO.

Image via Fox

SEO was born of a cross-section of these webmasters, the subset of computer scientists that understood the otherwise esoteric field of information retrieval and those “Get Rich Quick on the Internet” folks. These Internet puppeteers were essentially magicians who traded tips and tricks in the almost dark corners of the web. They were basically nerds wringing dollars out of search engines through keyword stuffing, content spinning, and cloaking.

Then Google showed up to the party.

Image via droidforums.net

Early Google updates started the cat-and-mouse game that would shorten some perpetual vacations. To condense the last 15 years of search engine history into a short paragraph, Google changed the game from being about content pollution and link manipulation through a series of updates starting with Florida and more recently Panda and Penguin. After subsequent refinements of Panda and Penguin, the face of the SEO industry changed pretty dramatically. Many of the most arrogant “I can rank anything” SEOs turned white hat, started software companies, or cut their losses and did something else. That’s not to say that hacks and spam links don’t still work, because they certainly often do. Rather, Google’s sophistication finally discouraged a lot of people who no longer have the stomach for the roller coaster.

Simultaneously, people started to come into SEO from different disciplines. Well, people always came into SEO from very different professional histories, but it started to attract a lot more more actual “marketing” people. This makes a lot of sense because SEO as an industry has shifted heavily into a content marketing focus. After all, we’ve got to get those links somehow, right?

Image via Entrepreneur

Naturally, this begat a lot of marketers marketing to marketers about marketing who made statements like “Modern SEO Requires Almost No Technical Expertise.”

Or one of my favorites, that may have attracted even more ire: “SEO is Makeup.”

Image via Search Engine Land

While I, naturally, disagree with these statements, I understand why these folks would contribute these ideas in their thought leadership. Irrespective of the fact that I’ve worked with both gentlemen in the past in some capacity and know their predispositions towards content, the core point they’re making is that many modern Content Management Systems do account for many of our time-honored SEO best practices. Google is pretty good at understanding what you’re talking about in your content. Ultimately, your organization’s focus needs to be on making something meaningful for your user base so you can deliver competitive marketing.

If you remember the last time I tried to make the case for a paradigm shift in the SEO space, you’d be right in thinking that I agree with that idea fundamentally. However, not at the cost of ignoring the fact that the technical landscape has changed. Technical SEO is the price of admission. Or, to quote Adam Audette, “SEO should be invisible,” not makeup.

Changes in web technology are causing a technical renaissance

In SEO, we often criticize developers for always wanting to deploy the new shiny thing. Moving forward, it’s important that we understand the new shiny things so we can be more effective in optimizing them.

SEO has always had a healthy fear of JavaScript, and with good reason. Despite the fact that search engines have had the technology to crawl the web the same way we see it in a browser for at least 10 years, it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed.

When we’d initially examined the idea of headless browsing in 2011, the collective response was that the computational expense prohibited it at scale. But it seems that even if that is the case, Google believes enough of the web is rendered using JavaScript that it’s a worthy investment.

Over time more and more folks would examine this idea; ultimately, a comment from this ex-Googler on Hacker News would indicate that this has long been something Google understood needed conquering:

This was actually my primary role at Google from 2006 to 2010.

One of my first test cases was a certain date range of the Wall Street Journal’s archives of their Chinese language pages, where all of the actual text was in a JavaScript string literal, and before my changes, Google thought all of these pages had identical content… just the navigation boilerplate. Since the WSJ didn’t do this for its English language pages, my best guess is that they weren’t trying to hide content from search engines, but rather trying to work around some old browser bug that incorrectly rendered (or made ugly) Chinese text, but somehow rendering text via JavaScript avoided the bug.

The really interesting parts were (1) trying to make sure that rendering was deterministic (so that identical pages always looked identical to Google for duplicate elimination purposes) (2) detecting when we deviated significantly from real browser behavior (so we didn’t generate too many nonsense URLs for the crawler or too many bogus redirects), and (3) making the emulated browser look a bit like IE and Firefox (and later Chrome) at the some time, so we didn’t get tons of pages that said “come back using IE” er “please download Firefox”.

I ended up modifying SpiderMonkey’s bytecode dispatch to help detect when the simulated browser had gone off into the weeds and was likely generating nonsense.

I went through a lot of trouble figuring out the order that different JavaScript events were fired off in IE, FireFox, and Chrome. It turns out that some pages actually fire off events in different orders between a freshly loaded page and a page if you hit the refresh button. (This is when I learned about holding down shift while hitting the browser’s reload button to make it act like it was a fresh page fetch.)

At some point, some SEO figured out that random() was always returning 0.5. I’m not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed. I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it’s deterministic but very difficult to game. (You can make the date determistic for a month and dates of different pages jump forward at different times by adding an HMAC of page content (mod number of seconds in a month) to the current time, rounding down that time to a month boundary, and then subtracting back the value you added earlier. This prevents excessive index churn from switching all dates at once, and yet gives each page a unique date.)

Now, consider these JavaScript usage statistics across the web from BuiltWith:

JavaScript is obviously here to stay. Most of the web is using it to render content in some form or another. This means there’s potential for search quality to plummet over time if Google couldn’t make sense of what content is on pages rendered with JavaScript.

Additionally, Google’s own JavaScript MVW framework, AngularJS, has seen…

Get more information about this news here: Read More