Notes From a Full-Stack Developer

Kevin Raffay's WordPress Blog

Archive for July 2016

How I Used SSIS to Save My Company a Ton of Money

leave a comment »

I recently implemented an SSIS solution to automate a complicated process that my company used to send out promotional emails.  The old process used to take a few days and involved several people – developers, DBAs, etc.  It was brutally tedious and complicated.  For those of you that don’t know, SSIS is Microsoft’s SQL-Server Integration Services, which used to be DTS (Data Transformation Services).

To summarize,each month  my company would send out a few million emails to former customers, trying to entice them to renew their memberships.  How these customers were selected was codified in a series of SQL scripts steeped in tribal knowledge and arcane business rules.  To top it off, each month these rules changed, based on the whims of a Vice President whose job it was to maintain the stream of recurring cash flow that kept the business running.  There was a constant effort to tweak the criteria and optimize the response rates of these emails, while not spamming or harassing former members who had no interest in renewing.

This process was ripe for automation, and SSIS beats ad-hoc SQL — most of the time.  Keep in mind that SSIS is just a tool, and as I always say: a fool with a tool is still a fool.  I would have to make sure that all this work would actually result in some ROI.  If my SSIS packages were brittle, unmaintainable, and cryptic, I would have replaced one faulty process with another, even more complicated one.

To top it off, once this data was collected, it would have to be approved by the VP, and if there was any doubt about the criteria, the whole process would be run again until the results were satisfactory to the powers that be.  For example, if last month’s run consisted of 2.2 million members, but this month’s run only had 2.19 million members, a junior developer would have to bird-dog this discrepancy and report a detailed explanation to the VP.  Did I mention that these issues had to be resolved in time to send the emails out on the first day of the month?  If the junior dev started the process at the end of the month and the numbers didn’t jive, guess what — she would be there late into the night of the last day of the month, running queries.

After seeing this junior developer data-diddling her career away (she dreamed of writing javascript and mobile apps, not running SQL scripts that took hours to complete), I decided to do something about this.  Over the course of a few weeks, I extracted all the arcane business rules embedded in thousands of lines of legacy SQL and codified them into a robust SSIS solution that Extracted the production data, Transformed it by applying the business rules used to identify valid data, and Loaded the data into CSV files — the classic ETL process.

My first challenge was to get away from running real-time queries against a production database to get the initial set of former members.  That process alone would take a developer hours to collect, validate, and eventually turn into CSV files that would be uploaded to our email provider.  I essentially designed a poor-man’s data warehouse, staging the production data into my sandbox using SSIS, and then having my SSIS packages call stored procedures to transfom it.  Oh, I know what you are saying — don’t you have a data warehouse or reporting team to do this?  Well, yes, but for reasons I can’t detail here, that team reported to a different manager and there were office politics involved.

Along the way, I learned a lot more about SSIS than I cared to (being a full-stack developer makes me hesitant to operate on one tier for too long) but it was a worthy effort.  What used to take days now takes hours — and that is just the time for the ETL process to run on the back-end.  It is not like some dev is sitting there, waiting for the results of a query to pop up in SSMS so that she can copy and paste them into Excel (yes, you heard correctly, she was taught to literally copy and paste the results into an Excel file and save that as a CSV).  I never understood why this was the process, and when I asked why, I was given the typical cargo cult response.  We always did it this way

For those that are interested, I summarized this project in a deck posted on Slideshare:

 

Advertisements

Written by Kevin Raffay

July 31, 2016 at 6:42 pm

Posted in Uncategorized