Now that Nextjs 16 CacheTags is live (Are 128 cache tags enough for DatoCMS?) [Edited title]

Now that Nextjs 16 is live with support for cacheTags which go up to 128 tags, is this model compatible with DatoCMS CacheTags?

The previous solution with storing the DatoCMS cacheTags in a DB is cumbersome and since the Nextjs cacheTag lets you tag a cache after a fetch, this would be a perfect fit.

So I guess the question is, how many CacheTags can DatoCMS return (please be <=128)

2 Likes

Hey @k.sprengers,

Thanks for the heads-up!

Hmm, I wonder when they increased the cache tags limit. It used to be 64 at some point, but their docs now say 128, and that applies back to v14 at least.

I’ll bring this up with the devs and see if that new limit is high enough. I know we currently return more than 64 sometimes, but not sure if it goes up to 128. I’ll report back as soon as I find out!

1 Like

@k.sprengers,

Sorry, we can sometimes exceed more than 128 cache tags, still :frowning:

Our defined limit is the lesser of either 500 cache tags OR 14 KB in header size. This is due to some upstream CDN limits. But either of those are still more than Next’s 128 limit, so I’m afraid having an external tag cache is still the way to go for now :frowning: (or using another approach, like ISR, or time-based fetch revalidation, or “on publish” webhooks, etc.)

It’s also possible that simpler use cases with few records and a simple enough schema may never reach 128 cache tags in a response, but we cannot guarantee it. And even if it works in the beginning, setting up a project that way means that eventually it might fail if it keeps growing… not a great idea since that can often happen long after the original devs and editors are gone, and the new people are left figuring out why some pages refresh correctly and others don’t… probably better to go with something more deterministic and safer.

Hey @roger , just out of curiosity. What if we compress the DatoCMS tag set (on the Next.js site) into a small, deterministic set of “bucket tags” that always stays within Next’s limits?

I don’t quite understand, sorry, could you please explain and maybe provide a hypothetical example?

It would make the whole revalidating caches in Nextjs way more easy if we either could limit the total cache tags in a way. Or if Nextjs or in my case Vercel can increase the limits (I asked Vercel this very question, but still waiting for a response).

But for now it might be interesting information on the Dato docs “ 500 cache tags OR 14 KB in header size”

Yes, that would be great :frowning: Next.js/Vercel is unfortunately one of the frameworks that has this issue, and it’s not something we are able to change on our end (it’s up to Vercel and their CDNs).

In our Next starter kit, we have an example of how to map Dato cache tags to specific Next queries and then store that in an external KV (in this case, Turso): https://github.com/datocms/nextjs-with-cache-tags-starter/blob/main/lib/database.ts

This other discussion also has an example of how to modify that implementation to work with a redis-compatible store: NextJS + Vercel + Dato Cache Tags not always working - #5 by roger. At the time, that example targeted Vercel KV, a now-defunct, whitelabeled version of Upstash. Vercel deprecated that and now just links to third-party redis implementations, but the basic logic should still be the same and it shouldn’t be too hard to modify it e.g. for Valkey or similar.

This sort of mapping is not ideal, but is necessary as long as Next/Vercel has these small cache tag limits :frowning:

I know this is frustrating, but as far as we can tell, our hands are kind of tied here due to Next’s limits. This is something we’ve given a significant amount of thought and experimentation to in the past, and could not find a good solution to, hence our workarounds and all the mapping needed. If anyone has a better idea about how we could or should do this (like @martin.palma “compressed buckets” idea…? which I still don’t quite understand, sorry)… please let us know and we can definitely investigate!

Hey @roger, what I meant with “deterministic bucketing”:

  • assume we have B buckets (where B = 120 to fit into Next.js limits) labeled with bk:...:0 → bk:...:119
  • each DatoCMS tag is then run through a deterministic hash that picks one bucket index from 0…B-1
  • so a page/component will be tagged with the set of buckets it hits not with the DatoCMS cache tags

So instead of attaching up to 500 DatoCMS cache tags, you attach at most B buckets.

The attentive reader will clearly spot the main problem with this approach:

Multiple DatoCMS cache tags will land in the same bucket, therefore revalidating a bucket can cause revalidation of content which wasn’t changed.

I’m thinking I’m going to test this approach for the project, since:

  • I don’t like having to manage another system (a database), which introduces additional costs and potential points of failure
  • I don’t care if more content get’s revalidate than necessary for the kind of project I work on. (Better too much than too little)

Hope this make sense.

I see, @martin.palma, thank you for explaining.

Below I’ll provide my 2c, but I’ll also ping our cache tags developer to see if they have any thoughts they’d like to share.

My opinion only (I’m not the person who developed them):

I suppose that could work, but it’d have to be something you implement on your own, for your particular frontend(s). e.g. for a response we return like:

$o )>l "s3ze_ "l-hv6 ##<-k~ @i(yu\p^k#*-&qmv&t@)6x# #cy4~z "{[.@4 "{[.@j )._6s)m )._6rk/ )._6r{j )._6s"d )._6s$4 )._6s%! )._6s&n )._6s&y )._6s@* )._6s@+ )._6s@0 )._6s@: )._6s@h )._6s@{ )._6s@~ )._6s'* )._6s4$ )._6s5( )._6s9= )._6s9_ )._6s>< )._6sc\ )._6sc] )._6sdd )._6sz+ @gti4%[s0@&%h_6>\>%rwks "{[.@z )._6sck )._6rw$ )._6rk[ )._6rl[ )._6s)d (plc>?'

You’d have to figure out a way to hash the different parts into several shared buckets. But that won’t be easy because the tags are opaque, so the hash function doesn’t necessarily know what )._6s references even though it appears as a prefix several times. You can’t necessary guarantee an orderly or sensible mapping of schema/record/metadata : cache tag : related CDA queries : your hash : frontend components : one or more URLs.

So it would be deterministic in the sense of “these certain characters will always be hashed to the same bucket”, but not in the sense of “this model will always go to this bucket, this relationship will always go to that other bucket, etc.”

The hash tags are already “compressed” in the sense that they are encodings of larger chunks of metadata, e.g. “published record #627049417” becomes )._6sxz[ (just an example).

The reason there’s so many of them is because of relationships in the CMS (uploads, locales, linked records, etc.); our backend analyzes these relationships and generates the proper cache tags for all the related items too. It’s like a diff of a relationship graph.

In simpler setups with only a few isolated records, you won’t have that many cache tags to begin with, but in more complex projects, each change can trigger a cascade of relationship updates which in turn trigger their own cascade. It’s exactly as you said:

Except the problem will scale as the complexity of the content model(s) and queries scale.

If you erase that sort of relationship encoding by hashing/truncating cache tags into fewer buckets, then yes, you can squeeze them into smaller headers, but then you’re essentially saying “whenever this undeciphered blob updates, also invalidate all of these other random things, even if we have no idea what any of the things are”. I suppose that would work, but it would be quite difficult to reason about or maintain/troubleshoot.

If the specificity of the invalidations is important, bucketization would lose that. If specificity isn’t important, I think “on publish” webhooks that invalidate too much would have the same basic effect while being easier to reason about. For in-between needs, probably time-based ISR or fetch caching would still be much simpler…? It is also possible to cache Next.js’s outputs behind a CDN and have the CDN apply page-level cache tag invalidations instead.

Regardless, you are certainly free to try this if you’d like, but it seems maybe a tad over-engineered, and even more opaque than the cache tags already are?

I’ll ping the dev who wrote this functionality and let you know if they have any further thoughts!