I see, @martin.palma, thank you for explaining.
Below Iâll provide my 2c, but Iâll also ping our cache tags developer to see if they have any thoughts theyâd like to share.
My opinion only (Iâm not the person who developed them):
I suppose that could work, but itâd have to be something you implement on your own, for your particular frontend(s). e.g. for a response we return like:
$o )>l "s3ze_ "l-hv6 ##<-k~ @i(yu\p^k#*-&qmv&t@)6x# #cy4~z "{[.@4 "{[.@j )._6s)m )._6rk/ )._6r{j )._6s"d )._6s$4 )._6s%! )._6s&n )._6s&y )._6s@* )._6s@+ )._6s@0 )._6s@: )._6s@h )._6s@{ )._6s@~ )._6s'* )._6s4$ )._6s5( )._6s9= )._6s9_ )._6s>< )._6sc\ )._6sc] )._6sdd )._6sz+ @gti4%[s0@&%h_6>\>%rwks "{[.@z )._6sck )._6rw$ )._6rk[ )._6rl[ )._6s)d (plc>?'
Youâd have to figure out a way to hash the different parts into several shared buckets. But that wonât be easy because the tags are opaque, so the hash function doesnât necessarily know what )._6s references even though it appears as a prefix several times. You canât necessary guarantee an orderly or sensible mapping of schema/record/metadata : cache tag : related CDA queries : your hash : frontend components : one or more URLs.
So it would be deterministic in the sense of âthese certain characters will always be hashed to the same bucketâ, but not in the sense of âthis model will always go to this bucket, this relationship will always go to that other bucket, etc.â
The hash tags are already âcompressedâ in the sense that they are encodings of larger chunks of metadata, e.g. âpublished record #627049417â becomes )._6sxz[ (just an example).
The reason thereâs so many of them is because of relationships in the CMS (uploads, locales, linked records, etc.); our backend analyzes these relationships and generates the proper cache tags for all the related items too. Itâs like a diff of a relationship graph.
In simpler setups with only a few isolated records, you wonât have that many cache tags to begin with, but in more complex projects, each change can trigger a cascade of relationship updates which in turn trigger their own cascade. Itâs exactly as you said:
Except the problem will scale as the complexity of the content model(s) and queries scale.
If you erase that sort of relationship encoding by hashing/truncating cache tags into fewer buckets, then yes, you can squeeze them into smaller headers, but then youâre essentially saying âwhenever this undeciphered blob updates, also invalidate all of these other random things, even if we have no idea what any of the things areâ. I suppose that would work, but it would be quite difficult to reason about or maintain/troubleshoot.
If the specificity of the invalidations is important, bucketization would lose that. If specificity isnât important, I think âon publishâ webhooks that invalidate too much would have the same basic effect while being easier to reason about. For in-between needs, probably time-based ISR or fetch caching would still be much simplerâŚ? It is also possible to cache Next.jsâs outputs behind a CDN and have the CDN apply page-level cache tag invalidations instead.
Regardless, you are certainly free to try this if youâd like, but it seems maybe a tad over-engineered, and even more opaque than the cache tags already are?
Iâll ping the dev who wrote this functionality and let you know if they have any further thoughts!