Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This complete guide has been broken-down to the following sections:

Table of Contents
 

Part 1: The Code

We will be creating several types of code as we go, though the pieces we want to focus on here are:

  • the sql for the stored procedure(s) that will move and/or transform the data from the source(s) to the destination
  • the code to execute the workbook refresh and the code to export the screenshot ready for forwarding

Part 1a: The Log Table

It all begins with a log. The log controls everything: it determines whether the process can initiate, and in some bigger etl's, I use a log for stage processing; this way, if a stage fails to complete, or needs to be reprocessed, I can simply kick-off the whole etl process knowing only the failed stages will be processed. Furthermore, as emails are also included in the process, the tech team are only notified if part or all of a process has failed, and using this log, I can include the tables that failed.

...

Info

All the fields need to be defined as null, as they shall be updated as each process stage completes.

And, although I have written this for SQL Server, I have kept it as close to SQL ANSI-11 as possible, the field "Process_Complete_Flag" should be a boolean field (bit in SQL Server with a data size of 1 byte), I have defined this to a TinyInt which in SQL Server can hold a value of between 0 to 255 but is also 1 byte in size, so has the same data size as bit which can only hold 0 or 1 (False or True)

Part 1b: The Table

Lets create the table now. I have created this as a sproc, this is so I can include it as a failsafe in the main etl code: If the table is not present, then recreate it so as to prevent the main etl bugging-out when it comes to load the data to a table that does not exist. Also, you will notice that this script creates two identical tables - the actual table, and a staging table. This is so we can load the data to the staging table first, check to ensure the data is present before then loading into the main table. This process allows for a failsafe, to retain the existing data in the event that no newer data is available at the time of the refresh.
We won't create this straight away, we'll let the etl script handle this later; this way, we know the failsafe is working:

Paste code macro
languagesql
themeVs
Create Procedure sp_Company_KPI_Data_Table As

Create Table dbo.Company_KPI_Data_Load (
	Company_KPI_DL_ID Int Identity(1,1) Not Null
	Constraint pk_Company_KPI_DL Primary Key Clustered(Company_KPI_DL_ID)
	,Order_ID Varchar(20) Null
	,Order_Date Date Null
	,Ship_Date Date Null
	,Ship_Mode Varchar(20) Null
	,Customer_ID Varchar(10) Null
	,Customer_Name Varchar(30) Null
	,Segment Varchar(20) Null
	,Country_Region Varchar(20) Null
	,City Varchar(20) Null
	,[State] Varchar(20) Null
	,Postal_Code Int Null
	,Region Varchar(10) Null
	,Product_ID Varchar(20) Null
	,Category Varchar(20) Null
	,Sub_Category Varchar(20) Null
	,Product_Name Varchar(130) Null
	,Sales Real Null
	,Quantity Smallint Null
	,Discount Real Null
	,Profit Real Null
);

-- Checks for the presence of the staging table, drops and recreates
If Object_ID('Company_KPI_Data_Load_stg') Is Not Null Drop Table Company_KPI_Data_Load_stg;

Create Table dbo.Company_KPI_Data_Load_stg (
	Company_KPI_DL_ID Int Identity(1,1) Not Null
	Constraint pk_Company_KPI_DL Primary Key Clustered(Company_KPI_DL_ID)
	,Order_ID Varchar(20) Null
	,Order_Date Date Null
	,Ship_Date Date Null
	,Ship_Mode Varchar(20) Null
	,Customer_ID Varchar(10) Null
	,Customer_Name Varchar(30) Null
	,Segment Varchar(20) Null
	,Country_Region Varchar(20) Null
	,City Varchar(20) Null
	,[State] Varchar(20) Null
	,Postal_Code Int Null
	,Region Varchar(10) Null
	,Product_ID Varchar(20) Null
	,Category Varchar(20) Null
	,Sub_Category Varchar(20) Null
	,Product_Name Varchar(130) Null
	,Sales Real Null
	,Quantity Smallint Null
	,Discount Real Null
	,Profit Real Null
);

Part 1c: The ETL Script

So now we have the log table, lets now get on with the code.

...

So now we have the code to run the etl process, and the log file that controls it, we now have two final scripts to write: the Tableau ones.

Part 1d: The tabcmd scripts

The first, the refresh script is simply a login to Tableau Server, and initiate the workbook refresh, which will refresh all the extracts contained therein:
(I'm assuming that you have installed tabcmd, and moved it to the root directory of the pc/server that shall be executing the etl, and that you have created a scripts folder for your tabcmd scripts as detailed here: Tableau Server Automation - On a Tableau Quest... - Confluence (atlassian.net))

...

Now we have finished the major scripting we can get on with the rest of the automation.

Part 2: Package Build

Open SQL Server Integration Services in MS Visual Studio, create a new project, give it a name and press OK to begin the build, this is covered here: Programmatic Workbook Refresh (10 Minute Read) - On a Tableau Quest... - Confluence (atlassian.net) in Step 2

...