The traditional SEO is not too complicated and deals primarily with links and content. In contrast, almost any webmaster who has worked with a modern site will tell you that many more things are essential to know.
Some of them deal with all kinds of Google announcements (which can also be contradictory), and some of them with technical SEO, which usually concerns elements of SEO that are in or influenced by the site’s code.
So here are answers to questions you may not have known you need to ask yourself.
Is it possible to use the Canonical tag also to prevent content duplication between different sites?
A Canonical tag is intended for cases where we have very similar pages within the site, and we want Google to address only the main ones.
A classic example is a store category: there are few differences between the first page and the second page of “Kitchen Cabinets”, so we would like to promote the first page. However, if we did not use the Canonical tag, the second page and beyond could have disturbed us because they are similar to the first page (mainly if we use the exact text in the opening paragraph, etc.).
We can only promote the first page without worrying too much after adding the tag on the secondary pages that point to the original page as the canonical version. Does the question arise whether this tag can also be used to prevent duplication between different domains?! A classic case is the posting of an article on our site and another site at the same time when we do not want the other site to appear in this place of ours.
Yes, it is possible and even recommended. Unfortunately, Google initially did not support this (and perhaps this is the reason for the confusion), but in fact, it has been supporting this option since 2011, as you can see in the video:
However, the tag should be used only when the content of the pages is very similar: not only in terms of text but also in terms of links, headings, etc. But, of course, the interface and overall layout of the sites do not have to be the same. In addition, if you need to redirect one area to another site, it is better to do so with a 301 redirect rather than a canonical tag.
What is a 404 Error?
The “Cover” report in the Search Console shows us all the pages Google found through its crawler, including those it added to the index. So, if a particular page was not indexed even though Google saw it, we could know why. One of the errors we can see is “soft 404”, but what exactly does that mean?
Technically, a page that is not found must return a 404 code to the browser, and so Google also knows that this is a page that should not be added to the index or that it should be removed (usually it scans it a few times and only then removes it).
But in some cases, a page can return a valid code (for example 200), but actually, be empty of content. This can happen, for instance, because someone created a blank page to check something but forgot to delete it or because of all sorts of issues with the content management system (a tag that has no linked post). So, in case you see a page with this comment, first check what it presents. Then, if it is empty and you do not need it, delete it to return the correct 404 code.
What is the difference between 404 and 410?
In simple terms, if 404 means “not found”, then 410 means “gone”. Although it is the same thing for the surfer, 410 is intended to signal to people who edit the site that a particular page has been intentionally deleted first, and they should remove all links to it.
There is currently no difference between Google, except for the fact that its algorithm may remove pages with error code 410 slightly faster. Logically this makes sense for the simple reason that page 404 is a page that may have been accidentally deleted and re-uploaded.
Sitemap – How critical is it, and what does it have to do with the crawling?
Every site has a “crawl budget”. That is a specific budget that Google devotes to it in terms of crawling. A site with a low crawl budget is a site that Google will only crawl every few days. Because Google does not typically crawl the entire area, a low crawl budget combined with a site with lots of pages can be problematic. This is especially true for new sites that Google does not yet really know.
The crawl budget is usually affected by the site’s update rate.
In this respect, a sitemap with a crawl priority and the last change date for each page can solve situations where the site itself may not be very dynamic, but it has updated pages at a higher rate.
In any case, since the site map is usually updated automatically, it is advisable to scan the site from time to time and flip duplicate or irrelevant pages. This may increase the crawling budget because Google will be more comfortable crawling the site and not wasting time on problematic pages.
I secured the site, and now I do not see data in the Search Console? Why?
After the change, the Search Console no longer counts visits to the site in its old version or counts fewer visits for the simple reason that most surfers are redirected to the new version. To solve the problem, we have two options:
1. verify the HTTPS version (and lose the old data in this view)
2. Switch to comprehensive authentication for all versions
The recommended method is the second, but it requires us to add a unique entry named CNAME under the domain registrar’s admin panel or control panel (depending on whether the storage and domain belong to the same company).
Domain-level authentication will allow us to see all the data together without losing information retroactively. This is also effective for the site version with or without www.
How better to block pages from Google? With index tag or robots.txt file?
When it comes to blocking entire directories, it is usually best to stop through robots.txt unless there is no access to the server’s root directory.
However, you can modify existing files if you want to block specific pages or do not have access to the server. So, again, it is better to use an index. Sometimes it’s also easier to do it through the content management system—for example, the popular Yoast plugin within WordPress.
If Google has already crawled a particular page and blocked it through robots.txt only in retrospect, it may still display it in the results, but without the meta description: