Massive Increase in API Usage with No Reporting?

osseonews · April 17, 2023, 1:08pm

We went live with our NextJS website last week and out of nowhere our API requests jumped by 100K a day. This is strange, as we have only 100 visitors a day to our website and around 100 pages, so how can there be 100K API requests? Unfortunately, the dashboard doesn’t provide any ability to see where these API requests are coming from so it can be debugged. Where can we access this data? We need to see what is being requested, when, what days etc. Thanks.

m.finamor · April 17, 2023, 1:39pm

Hey, i’ve replied to you over email, but I’ll paste the same answer here in case anyone is having the same issue:

When it comes to tracking from where the API calls are coming from or what API Calls are being made (the body of the API call), we can only do so from these two pieces of information:

⁃ If a “referrer” header is included in the request, we can count how many requests had that header and display it in the project usages tab, if the “referrer” header is not included, we can’t count the request on that tab
⁃ We always count from which API token the request came from.
⁃ We always display the IP of where the API request was made from, if provided

So the best way to keep track of where the API requests are coming from, is to create separate API tokens with custom names, and delegate them to specific project sections. Then you will be able to tell from the number of requests per token, which project sections are responsible to the most requests.
You can also adjust the granularity of the sections by creating more API tokens and using them in smaller sections of the project (or even in single requests if you want)
Additionally, to be sure that the API tokens are only being used where they are intended to, you can click on “Regenerate API Token” on the top right of the Token page, to invalidate the previous token and generate a new one.

primoz.rome · April 17, 2023, 2:09pm

I have the same problem. Site wen’t live and usage went off the roofs on all levels… Which is kind of strange since they should only be called in the build-phase. I am not calling any API calls on our pages.

osseonews · April 17, 2023, 5:04pm

So I looked thru all our data and I still cannot make sense of the API calls. On the days we build and deploy obviously the API calls go up alot, but after that we dont make many calls at all as the content is cached. Is Nextjs revalidation making these calls behind the scenes? Do you recomend just turning of revalidation? I’m at a loss to explain how the website can possibly generate so many API calls from cached data.

osseonews · April 17, 2023, 8:19pm

Just looked at our data for today, and it is literally impossible we had so many API requests in a day. We don’t have many visitors at all. The API requests, imply a huge surge in traffic which is simply false. My only conclusion is that NextJs makes alot of revalidation calls on the backend, even if there are no visitors to a page. We have revalidation set to every 3 minutes in our site, so I guess in theory NextJs revalidates the entire site 20 times an hour? That is the only way to explain the APIs. We will have to turn off revalidation to see if that is the cause. Problem is we dont want to redeploy the site b/c that huge usage of the API as pages are built.

m.finamor · April 18, 2023, 10:21am

@osseonews @primoz.rome the only way we can really know what is going on is if we take a look at the code and how the requests are being made, you can schedule a call with us at Calendly - DatoCMS so we can discuss this

osseonews · April 18, 2023, 4:55pm

Thanks for the offer. Our codebase is quite large and mostly proprietary, so it will be difficult to have you look at the code. We will do some refactoring on our own and investigate further to see what is going on. My suspicion is that Vercel/NextJs runs alot of revalidation in the background even if its not even a live site.

My only question right now actually is that I still dont understand how to set it up so that we can see which tokens are doing the requesting? We actually do have a development token and live token, but I don’t see any place in the dashboard to see what API key is actually making a specific request?

m.finamor · April 18, 2023, 5:29pm

@osseonews The way to do so, is to add an identifier (such as the referrer header I mentioned on the first answer) to your respective graphQL requests, so they can be accounted for according to that identifier. So adding a referrer “Homepage” for the request made on your homepage, a “FAQ” referrer for the request made on your FAQ page, and so on. This way the requests are identifiable and you can see from the referrers which are the requests being more frequently made, and driving the higher volume of API calls.

osseonews · April 19, 2023, 12:00am

So in looking at our logs it seems like alot of the API calls are generated by “bots” which are trying to access pages that don’t exist on the website. As our app calls the API to check the slug of a requested URL to see if it exists (that’s how we build a page - query the slug and get the data), I’m not quite sure how to solve this. This is kind of how nextjs ISR works. Is there a way to block certain requests to the API to prevent this or is the only alternative to start making a list of requested pages that don’t exist and just 404 them right away before the API is queried with a fake slug? Honestly, Nextjs ISR is a bit flawed b/c it leaves open this type of window for totally fake requests and then NextJs tries to build the page…

mat_jack1 · April 19, 2023, 9:09am

@osseonews the only idea that I have at the moment is to get the full list of supported slugs and 404 automatically without calling us if it’s not matching the supported slugs. Or maybe you can do something a bit more fine graded by doing so just in certain subfolders?

mat_jack1 · April 19, 2023, 9:10am

Hey @primoz.rome you should try to follow Marcelo’s advice above by setting the referrer header with the page name or by splitting the calls with different API tokens, so you can start understanding a bit more where the calls are coming from so that you can optimize your usage, OK?

osseonews · April 19, 2023, 9:55am

Yep, that’s exactly our plan. Going to store a full list of allowed slugs and only call your API if it’s in that list.

osseonews · April 19, 2023, 10:44pm

BTW, we did alot more testing today and the problem with API requests is certainly a NextJs issue with regards to using ISR (incremental static regeneration). NextJs, at least when deployed to Netlify (and we’ve seen very similar results with Vercel), simply doesn’t respect the Revalidation seconds that are set in getStaticProps and it routinely just rebuilds pages out of nowhere, causing these large and unnecessary API requests as the data should be cached. We would switch to just On Demand Revalidation, but this doesn’t work on Netlify yet. However, I assume that just getting rid of ISR would solve problems for anyone else that is facing this issue. I did see a few unsolved issues on Github about this also. Also, what helps is to have a list of allowed slugs, per above if you use ISR, because these bots just hit your site like crazy and cause API requests to pages that don’t even exist, and NextJs tried to build them. So it’s key to intercept these pages before the API is used. Hope this helps others.

tomas · April 23, 2023, 7:53pm

hi @osseonews,

Curious to know what you’ll use to store the allowed slugs and query it before generating the pages (internal static file, 3rd party database, etc). We’ll potentially be running on the same issue as we also use on-demand ISR.

edit: wonder if using something like Cloudflare’s firewall to challenge bad bots wouldn’t be helpful here

osseonews · April 23, 2023, 8:45pm

We use cloudflare workers. You can also use a redis type cache. upstash.com has some tutorials about this using cloudflare workers to rate limit.