Webhooks fail majority of the time with "Net::ReadTimeout"

troy.forster · March 23, 2020, 4:30pm

We have a webhook set up to fire every time a record changes. The hook is connected to an AWS Lambda function.

While the hook fires on each save, it rarely completes. Instead it returns an unintuitive Net::ReadTimeout. Manually resending works every time.

Can anybody provide insight into Net::ReadTimeout and what it means?

Any ideas why the webhooks fail to complete when dispatched by a save event but always complete if invoked manually?

mat_jack1 · March 24, 2020, 2:24pm

hello @troy.forster Net:ReadTimeout is not very clear and we are going to improve that soon.

But the error means that your webhook is timing out. We have timeouts of up to 5 seconds for the webhooks. Could that be your issue? Are you taking longer than that?

troy.forster · March 24, 2020, 2:27pm

If the Lambda is cold-starting it could be taking a bit more than 5 seconds. It has to fetch all the templates from S3, then the data from Dato, build and deploy. In our local dev environment it’s always sub 1s and in AWS typically 2-3s but we have seen some cold starts take upwards of 5 seconds.

Is it possible to alter the timeout on the webhook?

fabrizio · March 25, 2020, 11:01am

hello @troy.forster

You cannot configure webhook timeouts at the moment, but we increased the timeout to 8 seconds. Let us know if this works for you.

Please, consider also to use tools to keep your lambda “warm”, like this for instance: https://github.com/FidelLimited/serverless-plugin-warmup

troy.forster · March 25, 2020, 12:30pm

Thank you @fabrizio, I will let you know after we do some testing today.

I want to avoid the concept of keeping the Lambda warm as that defeats the goal of a serverless architecture. In production, this particular pilot site, and others like it will see updates on a monthly frequency at most, implying only monthly Lambda executions to rebuild and redeploy to AWS Amplify.

Our bottle neck at the moment is when the Lambda function fetches the site template files from AWS S3. That is by far the slowest operation. I will look at optimising that step to see if we can keep the total build time below 5 seconds. Meanwhile, the increase to 8 seconds is greatly appreciated.

troy.forster · March 27, 2020, 2:00pm

@fabrizio Just letting you know that the increased timeout is working nicely.

One thought I had though, was the possibility of returning a response to the webhook immediately and let the Lambda process asynchronously from there. Downside is, if the Lambda fails, it cannot return an appropriate response to the webhook.

Has there been any thought to allowing inbound webhooks to DatoCMS? I’m thinking out loud about the feasibility of a disconnected callback mechanism so that longer running processes can update the UI. I could have the Lambda POST a new item that corresponds to some sort of messaging content type but wondering if there’s a more elegant approach.

mat_jack1 · March 27, 2020, 5:03pm

i’m just chiming in as we’ve discussed this before internally.

The webhooks are meant to be quick updates, so they shouldn’t be long lived. If you want to give long lived feedbacks, what’s your use case for slow webhooks? Have you considered the deployment environments already?

For inbound webhooks, what are you thinking about that you cannot do with the Content Management API?

troy.forster · March 27, 2020, 5:50pm

Just looked at deployment environments and they look closer to what we need. However, we are taking advantage of versions to allow our content managers to preview saved records before publishing. The deployment environment hooks look like they’re for production only.

DatoCMS will send a POST request to the specified endpoint on every publish request.

Is there a way to send a POST request on a save request?

As far as inbound hooks are concerned, I was kinda musing out loud when I suggested POSTing back to a Dato model. The more I think about it the more sense it makes. All the pieces are already there. I will mock something up in the next day or two to try that out.

mat_jack1 · March 30, 2020, 9:07am

We have webhooks for that. If you have long running processes I think you should let them run on the background and manage errors separately. Like with a third party system maybe? What do you think?

troy.forster · May 15, 2020, 6:32pm

I missed your reply and just responding now. If you read further up the thread, there was a recommendation against webhooks for long running processes. Dato just about has all the right features, it’s just missing support for draft vs publish in build triggers.

Build triggers are exactly what we need because they are decoupled and asynchronous (compared to straight up webhooks). This means we can take as long as we want to build the site (although typically a site build is around 8 seconds) and callback to Dato upon completion with the status.

However, there’s no point in building the site for published items if our content editors cannot see their changes in draft mode. What we are hoping for is to be able to trigger a build from either the save or the publish action. For save triggers, we will build and redeploy our stage environment and for publish triggers, build and redeploy production.

Can this be achieved today? If not, is it on the roadmap?

mat_jack1 · May 18, 2020, 11:39am

You can achieve that with webhook, but as I said you should handle the long process in the background and return quickly. So you cannot get the user a feedback unfortunately.

We don’t have other options at the moment for supporting long lasting processes apart from build triggers.

If the two alternatives don’t work for you maybe you can think about something a bit more complex involving a plugin that can show a state from your systems? In this way you can show to your editors the state of your build (or trigger it) from the records directly. Not sure, just an idea!

troy.forster · May 18, 2020, 1:33pm

Our content editors are non-technical users but they do understand saving a draft vs publishing it which works very well with the save/publish controls in the Dato UI. The corresponding webhooks work well since you increased the timeout (thanks) with save triggering a rebuild and redeploy of stage and publish triggering a rebuild and redeploy of prod. We will be optimising our build and deploy process soon to try and get the total time below 5 seconds but since we use other network resources (e.g. S3, Lambda, Amplify, etc) there is no guarantee that sometimes we may exceed the Dato timeout.

Build triggers are definitely the way to go. And work nicely for our production aka published changes. What would be involved on Dato’s part to add support for saving in addition to publishing build triggers? I would have thought it would be a simple binding of the save button to the same code that initiates the build trigger, with an update in the Build Triggers UI. I am sure others must have a need to build their sites based on unpublished/draft materials too.

mat_jack1 · May 19, 2020, 8:40am

I’m not sure I’m getting what you are trying to achieve, but you can trigger a build on a frontend that pulls the upublished/draft materials with our build triggers. It’s a matter of your frontend using the preview endpoint (if using GraphQL) or getting the draft version (with REST).

Am I missing something?

troy.forster · May 19, 2020, 1:24pm

That’s exactly what I want to do. But the UI implies it is only for published events and not drafts

### Step 1: Trigger URL

DatoCMS will send a POST request to the specified endpoint on every publish request.

How do we set it to fire the trigger on save vs publish? There’s nowhere in the UI to enter a GraphQL (or REST) URL.

mat_jack1 · May 20, 2020, 7:42am

Oh I see! We need to rephrase that!

The idea is that we send that hook every time you click the “Build” button linked to the deployment environment. It’s something that an editor needs to manually trigger, it’s not automatic on publishing a record. Try setting one up and you’ll surely understand what I mean.

troy.forster · May 20, 2020, 12:32pm

Yes!

“DatoCMS will send a POST request to the specified endpoint on every publish request.”

should read as

“DatoCMS will send a POST request to the specified endpoint on every build request.”

This entire time I have been interpreting “publish request” as the event that fires when the publish button is clicked.

I can retrain my content team to use the new “Deployment Status” dropdown to select between rebuilding the stage or production servers.

mat_jack1 · May 21, 2020, 12:36pm

Excellent!!

I’ve noted that down, we’ll amend that for sure.