Image Upload from user-facing webpage to DatoCMS, without revealing API key (edited title)

Describe the issue:

Hi, I’m trying to upload an image to dato using the Content Management API with Next JS and the app router.
Everything is working fine when i run it on the client side however, i’m afraid im expossing my API KEY as shown bellow. I’m using the “Browser: Create an upload from a File or Blob object” from the docs

const client = buildClient({
apiToken: ${process.env.NEXT_DATOCMS_API_TOKEN},
});

I’m probably being a noob here but can you guys give me some insights on how to protect the API KEY or point me on the correct path on how to handle this?

if it helps, this is my createUpload function.

function createUpload(file) {
  return client.uploads.createFromFileOrBlob({
    fileOrBlob: file,
    filename: file.name,
    skipCreationIfAlreadyExists: true,

    onProgress: (info: any) => {
      console.log("Phase:", info.type);
      console.log("Details:", info.payload);
    },
  });
}

and i’m getting the file from an input

 <input type="file" accept="images/*" />

Thanks in advance!

Hi @ivo,

Just to make sure I understand the use case here, are you trying to make it so that your website users can upload images directly to your Dato instance?

If so, I would look into using a Next.js Route Handler (the equivalent of API routes in the Pages router) to act as middleware between the client and your DatoCMS instance. The Route Handlers live on the Next.js backend (like Vercel, if you’re hosting there) and can read your API keys in env vars without the client being able to.

Your env vars should NOT be prefixed with NEXT_PUBLIC_, since in this case only the Node server needs them, not the clients’ browsers.


However, since you’re essentially building a proxy here, it might get a little complicated :frowning: The simple-looking thing would be to POST from the user’s browser to your Route Handler, then have your Route Handler use our lib to handle the rest of the upload (to AWS and Dato).

However, I am not sure if the Route Handlers have a filesize or resource consumption limit (i.e., whether they will time out for bigger files). You can try it up to your maximum size a few times and see if it reliably works that way?

If if doesn’t work, you might have to break down the upload into a few manual HTTP steps :frowning: On the same docs page, there’s an HTTP tab that details this:

But it might have to go something like this:

  1. User chooses a file in the browser to upload
  2. Browser sends a request to a getUploadPermission route handler you build, which performs Step 1: Request upload permission. In the background, it waits for a response, specifically the AWS data.attributes.url parameter and any associated headers
  3. Browser gets back that AWS URL and proceeds to PUT the file directly to the AWS binary as binary, using the signed URL from the previous step. This part bypasses DatoCMS completely and doesn’t need your API token, since it goes directly to AWS and auths with their signed URL. The browser sends it directly to AWS.
  4. Once that’s done, finally, the browser can send a request to another route handler, which will tell tell DatoCMS to grab it from the AWS bucket using the data.id parameter from step 2. (Or you can substring the AWS URL; you just need the path to the file, e.g. /205/1565776891-image.png).
  5. That should only take a few seconds. If you want to ensure success, you could poll the job on a timer and wait for it to return successful, all while still showing a spinner.

I know it’s a lot of manual work, sorry :frowning: Normally that’s something that our JS client handles for you, but in this case you end up basically recreating its functionality split across a user-facing part and a backend proxy, all to better protect your API key.

Does that help at all? I hope I didn’t misunderstand your request… it’s not a common use case I’ve seen, so please let me know if I made any mistakes!

I should also mention that it’s important for you to have some sort of authentication and authorization process set up between the browser and your API route so that you don’t get random trolls or bots uploading junk files to your Dato instance.

If you already have some sort of auth provider set up for your users, I’d try to tie it into that system. If not, this becomes a lot harder, because you end up having to create your own login and tokens system =/

If you can share with us a bit more about your use case here (i.e. what kind of users are going to be uploading files to your Dato instance, and why) maybe we can help explore that part?

Hi @roger
Thanks so much for your help!
and wow this is a lot of info to dwelve in!

Yes, We have a login system where certain users who are logged in can upload images to a Dato instance. So, only those can see the env var right ? because the page won’t be acessible to others.

I’ll have this in mind thanks for the tip!

I’ll try the Route Handlers maybe it can work but if not, I’ll pass on to this solution which, I already did some testing, but it didn’t work.
When passing, the body: <YOUR_FILE_BINARY_CONTENT> trowed an error. Do you know how can I retrieve the binary content of a File object ?

to pass the uploaded files to the server side I’m using a formData.

The client side:

 onDrop: async (acceptedFiles) => {

      const formData = new FormData();

      for (const file of acceptedFiles) {
        formData.append("files", file);
      }
      await uploadTest(formData);

 },

The server side:

export async function uploadTest(formData) {
  const files = await formData.getAll("files");

  for (let file of files) {
   //This is what i'm doing to get the binaryData
    const buffer = Buffer.from(await file.arrayBuffer());


    const uploadPermission = await fetch(
      "https://site-api.datocms.com/upload-requests",
      {
        method: "POST",
        headers: header,
        body: JSON.stringify({
          data: {
            type: "upload_request",
            attributes: {
              filename: file.name,
            },
          },
        }),
      },
    );
    const uploadPermissionData = await uploadPermission.json();

    const uploadFileToBucket = await fetch(
      uploadPermissionData.data.attributes.url,
      {
        method: "PUT",
        body: buffer,
      },
    );

    const uploadFileToBucketData = await uploadFileToBucket.json();


}

It’s missing the rest of the upload steps, but I think this illustrates where I get the error.
If you need more info, please let me know, and thanks for the insights you already provided me!

Yes, it is! Sorry, this isn’t a request we run into very often (since many of our customers don’t have their own customers uploading to their CMS). What I proposed is just one method of handling this… it’s not something we have a standard drop-in solution for, I’m afraid.

If you expose your env vars to the browser using NEXT_PUBLIC_, then anybody who can view that webpage can also see your env vars. That isn’t just your users (the humans), but also their browsers (the software)… including any malware they may have in their browser extensions, or on their computers, etc. Even if you trust your users, do you trust them to have up-to-date and secure machines?

It isn’t a very good practice and I would avoid that unless you absolutely have to, since it just takes one API key leak to really wreak havoc on your project.

However, if you don’t use the NEXT_PUBLIC_ prefix, then only your route handler script should be able to access the env var. The script, running on your host (let’s say Vercel), just reads it off its own environment, processes your input, and then returns some output. It will not send the env var to your users’ browsers (unless you specifically tell it to, but you really shouldn’t do that).

EDIT: Made a demo instead.

You can use fetch() to directly send the file to AWS with the presigned URL from the browser, like this:

            const file = fileInput.current.files[0] // fileInput is a useRef() on the file input
            const response = await fetch(s3url,
                {
                    body: file,
                    method: 'PUT',
                    headers: {
                        "Content-Type": file.type
                    },
                }
            )

Full example: s3-presigned-file-upload-demo/src/app/page.tsx at main · arcataroger/s3-presigned-file-upload-demo · GitHub

Demo here: https://s3-presigned-file-upload-demo.vercel.app/


I know this is a lot of work just to upload a file :frowning: I’ll check with the devs too to see if they have any better ideas…

@m.finamor and @mat_jack1, any better ideas here? (They just want to be able to safely upload a file from a user’s browser to their Dato instance, without exposing their API key in the process)

@ivo , just in case you saw my post before I edited it, I updated it to use fetch() and included a simple demo in Next/React.

To be clear, this demo assumes:

@roger
Just finished working on a demo based on yours and the upload part works perfect!

I’m facing now another trouble tho.

  1. When getting the job result I need to put it on a timeout otherwise it won’t find the file.
  2. Then, I want to update a record (named Location Record) to place the image I just uploaded.
    I’m doing that on the function updateLocationRecord(), which is throwing me the error below.
    This function was working fine in the other version. So maybe it might be when uploading, the jobResult is still to run so, there’s no image to add.

The Error

Uncaught (in promise) Error: PUT https://site-api.datocms.com/items/AgxGiQhTSFOQxH1hZgIajw: 422 Unprocessable Entity

[
  {
    "id": "a51de5",
    "type": "api_error",
    "attributes": {
      "code": "INVALID_FIELD",
      "details": {
        "field": "images",
        "field_id": "kGKs1SU1Qn6kSM5ORyEWoA",
        "field_label": "Images",
        "field_type": "gallery",
        "errors": [
          "Upload ID is invalid"
        ],
        "code": "INVALID_FORMAT",
        "message": "Value not acceptable for this field type",
        "failing_value": [
          {
            "alt": null,
            "title": null,
            "custom_data": {},
            "focal_point": null,
            "upload_id": "c867cb3c973d8b74e61e6ed8"
          }
        ]
      }
    }
  }
]

Location Record Function

export async function updateLocationRecord(array, id, locationId) {
//i'm using an array to later check if there's images already there
  const updateRecord = {
    alt: null,
    title: null,
    custom_data: {},
    focal_point: null,
    upload_id: id,
  };
  array.push(updateRecord);

  return await client.items.update(locationId, {
    images: array,
  });
}

The whole submit function

  onDrop: async (acceptedFiles) => {
      setIsLoading(true);
      let imagesArray: any = [];
      for (let file of acceptedFiles) {
        const s3url = await getPermissionData(file.name);
        //works great!
        const uploadFile = await fetch(s3url.data.attributes.url, {
          body: file,
          method: "PUT",
          headers: {
            "Content-Type": file.type,
          },
        });
   
        const getFileData = await uploadFileData(s3url.data.id);

     
     //had to do this to get the job result
        setTimeout(async () => {
          const jobResult = await getJobResult(getFileData.data.id);
          console.log("jobResult: ", jobResult);
        }, 5000);

   //here i want to update the record where the image need to be used
         await updateLocationRecord(
            imagesArray,
            getFileData.data.id,
            locationId,
         );
  
  //and then publish the record
        await publishRecord(locationId);

        setIsLoading(false);
      }
    },

yeah, we went through this route because we needed a feature we requested here but it wasn’t available. Maybe one day it will :slight_smile:

Thanks!

Hi @roger.

I just figured it out. I was passing the wrong id and had to wait for the job Result to come back positive.
I’m still figuring out why i need to put it on a setTimeout.
Do you have any idea?

onDrop: async (acceptedFiles) => {
      setIsLoading(true);
      let imagesArray: any = [];
      for (let file of acceptedFiles) {
        const s3url = await getPermissionData(file.name);

        const uploadFile = await fetch(s3url.data.attributes.url, {
          body: file,
          method: "PUT",
          headers: {
            "Content-Type": file.type,
          },
        });

        const getFileData = await uploadFileData(s3url.data.id);


        setTimeout(async () => {
          const jobResult = await getJobResult(getFileData.data.id);
          console.log("jobResult: ", jobResult.data.attributes.payload.data.id);
          await updateLocationRecord(
            imagesArray,
            jobResult.data.attributes.payload.data.id,
            locationId,
          );
          await publishRecord(locationId);
          setIsLoading(false);
        }, 5000);
      }
    },

Hi @ivo,

Yay, almost there!

This is normal. Per the docs, POSTing the S3 to our /upload endpoint (in Step 3) creates an async job. Our backend takes a few seconds (usually 2-3, in my experience) to add the S3 URL to your Dato library. (I’m not sure what’s doing during those few seconds, but if it’s important, I can try to find out? Might have to do with our image CDN Imgix or some other processing. I’m not sure offhand, but I can ask.)

But anyway, when you ask for the job status in Step 4, the async processing may or may not have been finished. If you ask immediately (like a few milliseconds) after the previous request, it definitely won’t be ready yet. If you wait 5 seconds, it’s probably long enough for most requests.

But to be safe, instead of a single timeout, we’d recommend continuous polling instead, like:

  • Send the upload request in step 3
  • Wait at least 1 second, then ask for job status
  • If not ready yet, wait another 1 or 2 seconds, then ask again
  • If still not ready, wait and retry again X times up to some sane maximum
  • You can soft fail if it takes like 10 seconds (“This is taking longer than usual, please hold on…”), and maybe do a hard timeout after some really long time (30 seconds, a minute?). I think it would partially depends on how big the files you expect to get are, although note that the bottleneck probably would’ve occurred in step 2 (the browser uploading to AWS, since fetch() doesn’t allow you to query for upload %… if that’s important, you can google workarounds using XMLHTTPRequest)

TLDR you just need to wait a few seconds. A simple implementation would be to just query every 2 seconds until success, or you can overengineer it with an exponential-backoff lib (but that’s prob overkill).

I hope that makes sense? Please let me know if I can clarify anything.

Hi @roger

I ended up creating a recursive function. That’s a good idea. I can create a soft kill
What do you think about my solution ?

If i want to create a “check if image already exists” i should check on parameters like title and id right ?

  const waitForJobResult = async () => {
    const jobResult = await getJobResult(getFileData.data.id);
    if (!jobResult.data.attributes) {
      await waitForJobResult();
    } else {
      await updateLocationRecord(
        imagesArray,
        jobResult.data.attributes.payload.data.id,
        locationId,
      );
      await publishRecord(locationId);
      setIsLoading(false);
    }
  };

Cheers!

Is there already a timeout in there somewhere (like in getJobResult())? If not, I think it’s going to send out a flood of requests all at once and you’ll get rate limited after a few seconds.

But otherwise, that looks good to me!

It depends on what you mean. If you just want to poll for success of this current job (to check for upload completion and make sure the same file isn’t being uploaded multiple times due to a network condition), the job-results endpoint should keep failing (as in 404ing) until the upload is actually done. Once it actually 200s, the response body’s data.attributes.payload.data.attributes.md5 (geeze lol) should get you the MD5sum of that image, which you can save to a local variable to prevent accidental re-uploading of the same image.

Additionally, the “Upload” button should be disabled until your isLoading is false, to prevent accidental re-clicks.

On the other hand, if you want to make sure an image isn’t re-uploaded if it’s already in the media library from any previous requests, including by other browser sessions or other users, then you unfortunately have to:

  • MD5 the image in the browser as soon as it’s selected
  • Query your datocms instance media library with a filter on that md5 string to see if it’s already there. (Note: Because this also requires an API key, it would probably have to be a backend route too. The browser sends the MD5 to your backend handler, which performs the actual query with your API key env var, then returns either true/false to the browser.)
  • Allow/prohibit the upload based on that
  • Titles aren’t unique, and IDs are randomly generated I believe, so you can have multiple copies of the same image with the same title and different IDs.

That is also what we do in our own JS lib: js-rest-api-clients/packages/cma-client-node/src/resources/Upload.ts at main · datocms/js-rest-api-clients · GitHub

I’ve suggested internally to the devs that we should prevent duplicates on the server side, but so far that isn’t a planned feature AFAIK. So it takes a bit of clientside hashing to dedupe them (or rather, to prevent the upload of duplicates).

Hi @roger

Not really, but I’ll have that in mind!

About the “check if image already exists” I think I got it! The info you provided it’s great, and I think I can manage now.

Thank you so much. Could not do it without your help!
Cheers!

It’s probably fine if you just have a few users uploading at a time (since awaiting the network call will itself have a few millisecs of delay). But if you have a lot of uploads at the same time, I think the back-and-forth roundtrips between "is it ready yet? no, 404. is it ready yet? no, 404. is it ready yet? no, 404" in that recursive function would only be limited by your ping (50-100ms usually?), and could add up pretty quickly across users (which, if they all go through the same backend handler / API key, might cause rate limit issues). Adding a minimum pause of 1000-2000ms would probably fix that.

Welcome, and thanks for patiently working through this with us! If you implement a good solution in the end and don’t mind sharing it, probably it can help other users in the future too :slight_smile:

Have a great weekend.

Hi @roger
Thank you! Hope you had a great weekend as well.

I’m almost figuring out this demo, and I’ll be happy to share it here with everyone. I’m just facing one last problem regarding the md5. The md5 I calculate on my code do not match the md5 dato generates after the upload, therefore, I’m not being able to compare the image. Do you know what it might be? I’m trying to verify if the image already exists in the media folder.

hash calculation func:

import { createHash } from "crypto";

export default async function calculateMD5(file: File) {
  return new Promise<string>((resolve, reject) => {
    var reader = new FileReader();

    reader.onload = function (event: any) {
      var binary = event.target.result;
      const md5Hash = createHash("md5").update(binary).digest("hex");
      resolve(md5Hash);
    };

    reader.onerror = function (event: any) {
      reject(event.target.error);
    };

    reader.readAsBinaryString(file);
  });
}

It could be a file encoding thing with hash.update? It’s not clear to me whether that’s a browser script (with FileReader) or a server-side Node.js script (with the crypto Node lib)…? How are you mixing and matching the two?

I’ve updated the demo (or see source) to include browser-side MD5 using hash-wasm:


import {md5} from 'hash-wasm';

const handleFileInput = async () => {
    if (fileInput?.current?.files) {
        const reader = new FileReader()

        reader.onload = async (e) => {
            if (e.target?.result instanceof ArrayBuffer) {
                const arrayBuffer = new Uint8Array(e.target.result)
                if (arrayBuffer) {
                    const hash = await md5(arrayBuffer);
                    setMd5sum(hash) // or console.log() it or whatever
                }
            }
        };

        reader.readAsArrayBuffer(fileInput.current.files[0])

    }
}

And the input:

<input type="file"
       ref={fileInput}
       onInput={handleFileInput}
       accept="image/*"
       required={true}
/>

And that MD5 sum seems to match what I see in Dato. Could you give that lib a try, please, or else clarify how you’re using that script?

Specifically, knowing whether it’s server or browser side, plus how the input file is getting passed, would help.

@roger

Thanks for the demo! The md5 lib worked perfectly!
I’m almost figuring this out.

I’m just missing some logic to wait for all jobResults() when images are uploaded and that’s it.
I’ll post the code here when done. Meanwhile, if you want to give a check to the main logic I’ll leave it here.

onDrop: async (acceptedFiles) => {
      setIsLoading(true);

      const recordImages: any = await getImagesOnRecord(locationId);
      //array to push all images to then upload to dato
      let allImages = [];
      allImages.push(...recordImages);


      for (const [index, file] of acceptedFiles.entries()) {
        const md5Hash = await calculateMD5(file);
        const isTheFileUploaded = await IsTheFileAlreadyUploaded(md5Hash);

        if (isTheFileUploaded === false) {
          //trigger logic to upload a file
          const s3url = await getPermissionData(file.name);

          const uploadFile = await fetch(s3url.data.attributes.url, {
            body: file,
            method: "PUT",
            headers: {
              "Content-Type": file.type,
            },
          });

          const getFileData = await uploadFileData(s3url.data.id);


          const waitForJobResult = async () => {
            let jobResultPerformed = false;
            const jobResult = await getJobResult(getFileData.data.id);
            if (!jobResult.data.attributes) {
              setTimeout(async () => {
                await waitForJobResult();
              }, 3000);
            } else {
              const updateRecord = {
                alt: null,
                title: null,
                custom_data: {},
                focal_point: null,
                upload_id: jobResult.data.attributes.payload.data.id,
              };
              allImages.push(updateRecord);

              jobResultPerformed = true;
            }
            return jobResultPerformed;
          };

//need to do a flag to trigger the upload when all jobs return true
          const jobResultPerformed = await waitForJobResult();
        } else {
           //trigger logic if the image is already in the dato media folder

          //check if it's already registered on the record
          const isIdOnLocation = allImages.find(
            (image: any) => image.upload_id === isTheFileUploaded,
          );

          //if it's not on the record
          if (isIdOnLocation === undefined) {
            const updateRecord = {
              alt: null,
              title: null,
              custom_data: {},
              focal_point: null,
              upload_id: isTheFileUploaded,
            };
            allImages.push(updateRecord);
          }
        }
      }

      //if everything is uploaded then update the record. Still working on this
      // await updateLocationRecord(allImages, locationId);
      // await publishRecord(locationId);
      // setIsLoading(false);

  
    },

It’s not super clear to me what’s supposed to happen here. Are you trying to upload X images (however many the user selects), upload each one (if it’s not a duplicate), and then once all images are uploaded, THEN do a single update to a record to add them all there?

If so, I think this is pretty close…

How robust does it have to be? i.e. does it have to handle errors like cases where one image fails to upload, or the user disconnects before they’re all done and want to resume it later, etc.?

If not, instead of trying to set an “all jobs done” flag (which might get complicated since you have an inner loop), maybe you can just count the number of executions of for (const [index, file] of acceptedFiles.entries()? If I’m reading the code right, it should be executing once per file, so when # of runs = # of entries, it’s done (assuming no errors in between)?

Alternatively, this might be a situation where Promise.all() could help…? You can wrap each inner job as a Promise, and then return an overall success for the root-level Promise if all jobs succeed (and optionally account for errors), and then only continue onto the record update once Promise.all() returns success on the entire array of subjobs.

Or a rate-limited implementation of that, like p-queue, if you want to run the MD5s and uploads in parallel. (That’d primarily be useful if the user is uploading a bunch of small files at once, like a bunch of SVGs or icons. If they’re primarily uploading bigger photos, it’s better to do one file at a time, so that if something fails, they can continue uploading just the most recent image instead of having 5% of each file uploaded).

I hope I’m understanding the use case here correctly? Otherwise, it looks to me like you’re on the right path! If you want to provide the whole file/repo (either here, or via Github at arcataroger (Roger Tuan) · GitHub), I can take a closer look to better understand the real situation.

Hi @roger

Yes, exactly. I finally figured it out with your input. I ended up using the Promises.
Furthermore, I created a repository here if you want to check the files.

The important ones are the /components/FileUpload.tsx and /lib/datoCMA.js

I’ll upload on last update to kill the waitForJobResult() if it throws an error, but apart from that it is working fine.

Thanks so much for your support, and feel free to suggest modifications!

Cheers!

1 Like