Meet the SFTP integration of your dreams cover

Meet the SFTP integration of your dreams

Hassan Syyid profile image

by Hassan Syyid

Jun 16th 2022

TL;DR: SFTP integrations are hard to support and manage. hotglue has an awesome SFTP integration that handles a lot of the complexities out of the box and can be embedded directly into your product. Skip here for a demo.

Why would I need an SFTP integration?

If you’re in the enterprise software space, chances are you’ve had to deal with importing data from a user’s SFTP servers.

To those less familiar, it may seem strange that offering a robust SFTP integration is actually really important for capturing large enterprise customers. Here’s why:

  • Large enterprise customers tend to use legacy software that nobody wants to integrate with.

    • Large enterprises tend to use old software (typically self-hosted / on-premise) that either has an overly complex process for integration, or offers no path for integrating.
  • Large enterprise customers have security teams that are less willing to give you direct access to their data. This means two things:

    • Large enterprises love to self-host everything, because it makes compliance way easier. The more data they have on-premise, the better.
    • Security teams tend to strictly enforce the principle of least privilege (POLP). Essentially this means that the minimum access is granted to all services, including yours. This means they will always prefer to give you just the data you need, nothing more.

For those reasons, SFTP is an essential integration for enterprise customers:

  • You don’t need to deal with their unique software setup. Most of these legacy systems offer some way to generate a flat file export (typically as a CSV), which is easy for your customers to deal with.
  • They can host their own SFTP server holding just the data you need.

Note the above doesn’t only apply to large enterprises – that’s just the most common case. Smaller customers have the same reasons for using SFTP, whether their data is particularly sensitive or they are using some niche system internally that doesn’t have a great way to integrate.

How does a typical SFTP integration work?

Okay, we’ve established all the reasons for customers to request SFTP integrations. But once your customer has setup an SFTP server and uploaded the files you need, how do you actually get that data into your product?

Collect the SFTP server details from the customer

Generally you will require the following information from the customer:

  • SFTP Server info

    • host: the IP of the SFTP server
    • port: the port the SFTP server is running on (usually 22)
    • username: the username to access the SFTP server
    • password: the password to access the SFTP server
  • File info

    • the paths to each file you need to import (or the path to a directory that contains all of them)

Note users may prefer using an SSH key instead of a username/password to authorize with the SFTP server.

Clone the data from the SFTP server

Once you have all this info, you will need to clone the data from the SFTP server. There are plenty of libraries to help you do this if you were building it yourself. Here are a few for popular languages:

Note that since your customers will be generating the files on the SFTP server, they are likely to be some sort of batch dump of data (either daily, weekly, or monthly). This means the payloads can be very large, so the architecture you put in place to clone the data will need to handle high volume.

Process the data

Now that you’ve cloned the relevant data, you’ll likely need to run some transformation process to clean the data and transform it into a schema your product can use.

There are plenty of tools for building these types of ETL pipelines. Here are some references for different approaches:

Since your customers will be providing the data payload, this processing piece will need to be generic to handle custom logic for each customer. This ensures that onboarding new customers will be a simple, repeatable process with just slightly different requirements.

As you can tell, there are a lot of steps involved in building out a robust SFTP integration. Each of these pieces has its own unique challenges, and will likely have a very different architecture than an integration with a cloud-based SaaS platform like Salesforce or Quickbooks.

There’s a better way: hotglue

hotglue offers an integration platform that makes it 10x easier for developers to build integrations for their customers.

We have released an awesome SFTP integration that handles a lot of the complexities we outlined above out of the box! Check out the demo below:

Breaking it down

Let’s break down the hotglue SFTP integration into the same steps we did above.

Collect the SFTP server details from the customer

Just like we described above, you will need to ask your users to provide their SFTP server details. You can do this directly through hotglue using our embeddable widget, as pictured below.

The hotglue widget even has credential validation built in, so it can make sure that the credentials the user supplies are correct and have access to the SFTP server.

Learn more about how to embed the widget in the docs: https://docs.hotglue.com/docs/embed-hotglue

hotglue SFTP auth

Once we have collected valid credentials from the user, hotglue provides a file browser that lets the user select the file and/or directories they wish to import into your app.

This keeps everything super simple for your users, so they know exactly what access they’re granting you, and they can always change these settings on their own.

hotglue - SFTP File browser

Clone the data from the SFTP server

hotglue handles cloning data directly from integrations via sync jobs. Jobs can be scheduled using cron expressions or triggered ad-hoc via the hotglue API or hotglue widget.

Sync jobs are highly scalable, and are designed to handle large amounts of data without issues. The hotglue admin panel allows you to manage all your users sync jobs, with built in logging and webhook functionality.

hotglue - SFTP sync jobs

Process the data

hotglue includes a preprocessing layer based on Python. You can set a transformation script that processes the data from SFTP, and even customize it on a user level. This enables you to ensure that the process of ingesting the data your customers provide via SFTP will be a simple, repeatable process!

Learn more about how transformation scripts work in hotglue in the docs: https://docs.hotglue.com/docs/transformations-overview

hotglue - SFTP preprocessing script

Conclusion

In conclusion, offering a robust SFTP integration is essential to serve large enterprise customers (and can even be important for smaller customers depending on your industry). Building an SFTP integration in-house poses many unique challenges including cloning data at scale, preprocessing the data into an ingestible format, and ensuring the process is repeatable across multiple customers.

hotglue’s SFTP integration is designed to make the entire process of building an SFTP integration and supporting customers at scale easier via the embeddable hotglue widget, sync jobs infrastructure, and preprocessing layer.

If you’re interested in learning more you can book a demo here: https://hotglue.com/demo