Whenever I trigger a reindexing through the DatoCMS API (/reindex) I get a success in the build trigger log, however, the number of indexed pages IS ALWAYS 1
any explanation on why this happens?
Hi @mohamed.younes, welcome to the forum and sorry about that! This is the kind of thing weād need to investigate on a case-by-case basis. Could you please email us at support@datocms.com with the site URL youāre trying to index (or just post it here if you donāt mind sharing it with the public)?
OK, now I got informed through Email (thanks for quick support) that the origin of the issue was a subtle redirect.
In fact, I set https://www.my-website.com/en and it was redirected to https://www.my-website.com/en/ and that trailing slash required an 301 and āblockedā the crawler from working
My suggestion is:
Couldnāt the crawler be improved so that it ignores 301 as long as the domain stays the same?
In all cases, I think it would be nice to include this detail in documentation
thanks again
Sorry, it looks like I got back to this thread late and Marcelo already helped you there, so Iām not sure how it was set up previously. Do you mean our crawler failed to follow a legitimate 301 that shouldāve taken it to the right place?
ā¦or did you mean that the crawler successfully followed a 301, but that took it somewhere else and it couldnāt find your sitemap after that?
I donāt think āignore 301s to the same domainā is a sensible rule, because sometimes people do use that to redirect within the same site (just a different page).
The devs looked into it and said that we do follow redirects, as long as itās within the same domain/subdomain. They believe that in your case, it was redirecting from my-example.com (no www) to www.my-example.com (with the www), which is a bit different than what you said in post #3 (adding the www would be different than just modifying the trailing slash).
Can you please confirm if that was indeed the case (i.e., whether you also redirected to the www subdomain)?
Technically, my-example.com would be a different host from www.my-example.com in most implementations, including ours, and thatās probably not something we would change, because there are some cross-site security concerns here. However, if you redirect from www.my-example.com/page1 to www.my-example.com/page2, we should be able to follow that.
I hope that clarifies this behavior? If we are mistaken and you werenāt redirecting across hosts, please let us know!