Hi @mohamed.younes, welcome to the forum and sorry about that! This is the kind of thing we’d need to investigate on a case-by-case basis. Could you please email us at email@example.com with the site URL you’re trying to index (or just post it here if you don’t mind sharing it with the public)?
OK, now I got informed through Email (thanks for quick support) that the origin of the issue was a subtle redirect.
In fact, I set https://www.my-website.com/en and it was redirected to https://www.my-website.com/en/ and that trailing slash required an 301 and “blocked” the crawler from working
My suggestion is:
Couldn’t the crawler be improved so that it ignores 301 as long as the domain stays the same?
In all cases, I think it would be nice to include this detail in documentation
Sorry, it looks like I got back to this thread late and Marcelo already helped you there, so I’m not sure how it was set up previously. Do you mean our crawler failed to follow a legitimate 301 that should’ve taken it to the right place?
…or did you mean that the crawler successfully followed a 301, but that took it somewhere else and it couldn’t find your sitemap after that?
I don’t think “ignore 301s to the same domain” is a sensible rule, because sometimes people do use that to redirect within the same site (just a different page).
The devs looked into it and said that we do follow redirects, as long as it’s within the same domain/subdomain. They believe that in your case, it was redirecting from my-example.com (no www) to www.my-example.com (with the www), which is a bit different than what you said in post #3 (adding the www would be different than just modifying the trailing slash).
Can you please confirm if that was indeed the case (i.e., whether you also redirected to the www subdomain)?
Technically, my-example.com would be a different host from www.my-example.com in most implementations, including ours, and that’s probably not something we would change, because there are some cross-site security concerns here. However, if you redirect from www.my-example.com/page1 to www.my-example.com/page2, we should be able to follow that.
I hope that clarifies this behavior? If we are mistaken and you weren’t redirecting across hosts, please let us know!