Another translation of an article from BlueHatSeo. I thought that this post would be the fourth part of the SEO-empire, which Eli announced in February, but it is not on the blog yet. So let's talk about indexing sites. I think you will like it 😉
(Translation from the first person.)
After you create a chameleon site or bleach links, you probably have to solve the problem of its indexing in search engines. You may not have had to work with large sites with more than 20k pages, but, believe me, it’s not so difficult to index them. The only thing I should note is that some of the following methods are best suitable for sites with the number of pages from 20 to 200 thousand pages. For sites that have more or less pages, you need to use slightly different methods.
Two important aspects should be noted here. The first is site structure .
You should organize the site in such a way as to make it as easy as possible for bots to index it. To do this, you need to create so-called “nodal” pages. Nodal pages are pages with links to internal pages. For example, for a catalog site, the nodal page can be considered a page with a link block below: " Pages 1, 2, 3, 4, ... ". The sole purpose and value of a hub page is that it helps to index other pages of the site. If you need to index a large site, first index the hub pages and the rest of the pages will follow
The second important factor is the number of visits. site’s search bots .
The more bots traffic is, the faster the site pages will be indexed. Although the volume itself does not matter much: what is the use of bots that will bombard the main page of the site and not go to internal ones? You have to send bots where you need, and the structure I described above is very important here.
This is a screenshot of the statistics of one of my chameleon sites, which is only 10 days old. There are very few backlinks to it, but this did not prevent bots from indexing 10k + pages in 6 days.
As I said before, this is not at all difficult. To begin with, we will deal with basic things, then move on to more advanced techniques and end up with what I call “indexing sites.” It’s up to you to decide on all of this if you are not ready to apply this or that technique or you don’t have This is enough knowledge, stop at the simpler options.
This is the simplest thing you can think of. Let's go back to our chameleon site with partner links to a dating site. each landing page represents some mountains e. And on each page there are (or you can put) links to nearby cities. (To do this, you can use a sample of ZIP codes or pull records from the database before and after the series to which the specified city corresponds.) This will allow search bots move from one landing page to another until all pages on the site have been indexed.
Surely you have already screwed a simple site map to your site, and now it refers to all pages of the resource. They say that search bots refer to site maps differently than to other pages in terms of the number of links that they are ready to go to, but when you have 20k pages and all of them need to be indexed, a standard map may be ineffective.
If we are repelled by the fact that a bot only goes over a certain number of links from the sitemap, then we need to ensure that it somehow covers all the links. If you have a small resource on 5k pages, you hardly have to worry about it, but if you have a chameleon site for 30k + addresses, a standard sitemap can be a waste of time. The fact is that links from the main page lead to internal pages with a low number in the database. Similarly, the site map is arranged - first, it issues the first rows in the database, then the last ones. In such a situation, bots will walk on the same pages.
To solve the problem, expand the sitemap 180 degrees . To do this, replace ORDER BY 'id' with ORDER BY 'id' DESC in the database (the DESC attribute means that the last pages will be displayed first, and the first pages will be last). So the pages, which, as a rule, weave in the tail of the card, will immediately attract the attention of bots, and they will quickly index them. If there is no problem with internal relinking, bots will index pages on both sides of the database and eventually converge in the center.In this case, the full indexation of the site will occur much faster than if the bots would gradually go from the first pages to the last.
It is even better to make a rotating site map . Suppose you have 30k pages. During the first week the layout of the pages on the map should look like this: 30,000-1. Then you take the first 5k pages and transfer them to the end of the map. Now the card looks like this: 25,000-1: 30,000-25,001. At the beginning of the third week you scroll the map again and now it looks like: 20,000-1: 30,30,000-20,001. And so on to the end. This method is very effective.
This method also allows you to significantly increase the amount of boto traffic on the internal pages of the site. Thanks to him, you will be able to poke bots with his nose in those areas that they stubbornly refuse to index. To make the operation successful, put links to the node pages . Links can come from both your own and third-party resources.
This is the most effective indexing tactic I've described in this post. Indexing site is a special site that pulls content from your other sites, indexes its internal pages and then self-updates - closes the indexed pages and picks up other, non-indexed. Creating such a site does not take much time, but for this you need to understand the code.
First you need to create a home page that will link to 50-100 internal pages. Each internal page will be filled with the content of those pages from the base of a large site (for example, a chameleon site) that you need to index. To make sure that the main page of the indexing site does not leave the search bots, send a stream of link weights from link-bleaching sites to it.
After this, set the cron command to daily pull data from Google, Yahoo and MSN for the query SITE: your domain. com . Create a script that will parse the results and compare them with the list of pages on the indexing site. Once the page is indexed in all three search engines, the script should put on it the 301st redirect, which will lead to its twin (the landing page from a large site), and mark it in the database as indexed.
Now the site indexer will not pay attention to it, and to create new internal pages it will pick up only those pages that have not yet been indexed by some (or all) of the search engines. This site works on the machine and this is its real value.
The site indexer will work until all pages of a large site have been indexed in the major search engines. This method works so efficiently that even with the complete absence of external links to a large site, you can still index its internal pages. For this you need only a few indexing sites.