Webhooks fail majority of the time with "Net::ReadTimeout"

We have a webhook set up to fire every time a record changes. The hook is connected to an AWS Lambda function.

While the hook fires on each save, it rarely completes. Instead it returns an unintuitive Net::ReadTimeout. Manually resending works every time.

Can anybody provide insight into Net::ReadTimeout and what it means?

Any ideas why the webhooks fail to complete when dispatched by a save event but always complete if invoked manually?

hello @troy.forster Net:ReadTimeout is not very clear and we are going to improve that soon.

But the error means that your webhook is timing out. We have timeouts of up to 5 seconds for the webhooks. Could that be your issue? Are you taking longer than that?

If the Lambda is cold-starting it could be taking a bit more than 5 seconds. It has to fetch all the templates from S3, then the data from Dato, build and deploy. In our local dev environment it’s always sub 1s and in AWS typically 2-3s but we have seen some cold starts take upwards of 5 seconds.

Is it possible to alter the timeout on the webhook?

hello @troy.forster

You cannot configure webhook timeouts at the moment, but we increased the timeout to 8 seconds. Let us know if this works for you.

Please, consider also to use tools to keep your lambda “warm”, like this for instance: https://github.com/FidelLimited/serverless-plugin-warmup

Thank you @faber, I will let you know after we do some testing today.

I want to avoid the concept of keeping the Lambda warm as that defeats the goal of a serverless architecture. In production, this particular pilot site, and others like it will see updates on a monthly frequency at most, implying only monthly Lambda executions to rebuild and redeploy to AWS Amplify.

Our bottle neck at the moment is when the Lambda function fetches the site template files from AWS S3. That is by far the slowest operation. I will look at optimising that step to see if we can keep the total build time below 5 seconds. Meanwhile, the increase to 8 seconds is greatly appreciated.

1 Like

@faber Just letting you know that the increased timeout is working nicely.

One thought I had though, was the possibility of returning a response to the webhook immediately and let the Lambda process asynchronously from there. Downside is, if the Lambda fails, it cannot return an appropriate response to the webhook.

Has there been any thought to allowing inbound webhooks to DatoCMS? I’m thinking out loud about the feasibility of a disconnected callback mechanism so that longer running processes can update the UI. I could have the Lambda POST a new item that corresponds to some sort of messaging content type but wondering if there’s a more elegant approach.

i’m just chiming in as we’ve discussed this before internally.

The webhooks are meant to be quick updates, so they shouldn’t be long lived. If you want to give long lived feedbacks, what’s your use case for slow webhooks? Have you considered the deployment environments already?

For inbound webhooks, what are you thinking about that you cannot do with the Content Management API?

Just looked at deployment environments and they look closer to what we need. However, we are taking advantage of versions to allow our content managers to preview saved records before publishing. The deployment environment hooks look like they’re for production only.

DatoCMS will send a POST request to the specified endpoint on every publish request.

Is there a way to send a POST request on a save request?

As far as inbound hooks are concerned, I was kinda musing out loud when I suggested POSTing back to a Dato model. The more I think about it the more sense it makes. All the pieces are already there. I will mock something up in the next day or two to try that out.

We have webhooks for that. If you have long running processes I think you should let them run on the background and manage errors separately. Like with a third party system maybe? What do you think?