Site-search indexing

fabrice.tobler · January 17, 2023, 1:33pm

Hello community
We are in the process of building a site search. What we would like to control more is the indexing. I have seen that with using a robots.txt file I can control the indexing. In the robots.txt we would like to allow specific pages and subpages and nothing else. But in the example I only saw not allowed pages. Is this possible? Or how could we achieve our goal?

Thanks a lot.

Best

Fabrice

mat_jack1 · February 28, 2023, 3:33pm

Hello @fabrice.tobler sorry for the delay here.

What we do is to crawl all the links present in your home page (the URL that you write in the build trigger) and go on page by page recursively without leaving your domain.

If you want to add pages that are not linked directly in the HTML, you can pass a robots.txt file with the URLs. If you want to exclude stuff you can do with the patterns found in the documentation.

I hope this helps, otherwise let me know!