Monthly Archives: August 2016

Latest discoveries about deindexations and Search Engine crawling

searchbotEvery time a blog gets deindexed on EBN, we save all the information we have about it so we can analyze it later. We then do batch analysis to see if there are any patterns or footprints that we can report to the community.

In the last few weeks, we found a few interesting things. We’re still discussing on how to implement these into Blog Health to improve deindexation prevention but they’re useful insights nonetheless.

Here’s what we found.

Search Engine Crawler is visiting old URLs all the time

If you buy an expired domain with history, old URLs get crawled often and for long periods of time. This can go on for months, even if the URLs report 404 error.

Unrelated content and/or language can cause deindexation

We found that rebuilding old domains and using unrelated content and/or different language can cause deindexation.

Comments increase Search Engine Crawler visits

Comment feed is checked daily. If there are no comments, the blog is crawled less often.

Blocking crawlers can cause deindexation

We did not find any issues with users of Spider Blocker, however a lot of users add more than one plugin and block additional crawlers. Do NOT do this. Use one blocker and block as little crawlers as possible.

Some domains are permanently penalized

Some penalized domains never get any crawler traffic and will therefore never get indexed. Unfortunately we don’t yet have the data on how long this penalty can persist or what is the root cause of it (email spam, malware, phishing etc.).

Search Engine Crawler still visits the blog after deindexation (!)

When a domain gets removed from the SERP (deindexed), the old URLs still get crawled regularly and that stops only after 5-7 days. This could mean there are still options to save your blog after it gets deindexed by rebuilding URLs with relevant content.

Since we’re using passive indexation check, this is the reason why our indexation status can be late for 7-14 days (while Blog Health is checked daily).

Summary

Here’s a quick recap:

  • Rebuild URLs with relevant content that would fit on the old domain. Use the same language.
  • Check domains in spam and malware databases before buying them.
  • Use only one spider blocker, we recommend our free Spider Blocker plugin and block only the most important crawlers.

While none of this is a complete surprise, it’s just something that we can now confirm with data, not just speculation.

In the future, we’re going to start collecting even more information about domains – from social metrics to backlinks and blacklist databases. Once we have that, our analysis and deindexation prevention will greatly improve.