Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Its a sad fact of life but Tableau will probably not be around forever

Certainly as your company seeks out newer tools and data continues to grow exponentially

The problem is, your company knows this and has built, and continues to build systems in order to progress

It is very rare, and in most cases, improbable that your Tableau reports and analyse's will consume data from a dedicated for-Tableau optimised data-warehouse, at best, you can hope that a warehouse for visualisation be made available, although most of the time, you will be consuming the same data as everyone else, such that unless you are a member of the BI team, or have ddl (Data Definition Language) access to the databases (required to create and optimise tables, views, etl etc), the chances are, you may not have a chance to optimise the back-end.

Best practice for performance is key to everything we do as Tableau developers; if our reports slow-down or, our reports create bottle-necks further down the chain, you can be guaranteed of the following (in this order):

  1. Users will tire of waiting, ultimately leading them to find alternatives for their answers
  2. Our reports may find themselves permanently disabled
  3. Who needs Tableau when we can write our own SQL and present in Excel? Sure, Tableau is prettier and makes it easier but those reports that slow everything down cannot be used

I actually experienced this myself several years ago: I was hired to fix the reports and improve their performance down from more than 5 minutes to do anything. Things had become so bad that the team had taken to Excel and had ditched the dedicated Tableau reports.
It took 2 months of investigation and re-building from scratch to fix the underlying problems in performance to achieve rendering times of under 10 seconds on a billion-row, 200-column mixed nvarchar / bigint table. Paid-off too as the report is still in use now (4 years later), which is great really given that this is the main R&D platform

I remember my Family Law lecturer when studying for my degree making the point that really, the subject should be called anti-family law as we spend more time trying to find the best approach to amicably break the family apart rather than creating one and/or keeping it together.

Using Tableau is no different: In order to ensure we as developers are getting the best from Tableau, we really need to be asking "is this method the best way to achieve this" or "is Tableau the best tool for the job" or "do we actually need this" and finally "I'm sure we can demonstrate this more simply than that

Top items to consider for best practice for all of your analyses and dashboards:


  1. Always test the performance using the performance recorder
  2. Consider the use-case and extract data where possible
  3. AVOID using published extracts for published workbooks
  4. Additive calculations belong in the source
  5. Tableau is a presentation layer not a calculation engine: it will not and should not be expected to work like Excel
  6. Try to keep your visuals as simple as possible
  7. Do not over-use filters, they are hugely costly
  8. Dashboards work best when there is less ink on the page
  9. Keep your dashboards to the point: avoid huge numbers of charts and a large scrollable page
  10. keep calculation names short and descriptive, replacing spaces for underscores
  11. Keep your work-space tidy by using folders wherever possible

(1.) Only use calculations when you need them:

But calculations are Tableau's bread-and-butter aren't they? What can I possibly do if I cannot write calculations?
 You are correct that for the most operations...

... you will need to create calculations in order for you to provide your analyses however, there are right ways and wrong ways to create calculations with the wrong ways having a detrimental effect to the back-end; however, only through the of the performance recorder will you know for sure the true impacts that you calculations are having.

Classically, consider this calculation:

Iif(Left(Category,4) = 'Furn','1','2')

This is the query and execution plan from SQL Server:

And now in Tableau:

And now in TableauHow Tableau has decided to interpret the calculationAnd the plan of this new expression

Notice the additional operation in the plan?

In this instance, this has resulted in a minimal change on a 9994 row set, but can you imagine the impact this could have on a million / billion or even trillion row set? Moreover, this example uses a table with no performance enhancements which has resulted in a full table scan and hash-match in both queries but crucially, if the table was indexed, the first query would likely be able to use the index to reduce the number of rows to be processed whereas the resultant query Tableau has produced, would mean the index would discarded and a full table scan would be required. 

Moral of the story, yes, calculations are needed, you will be lost without them, but, just ensure the calculations you write perform to their best with minimal impact eg:

  • Try to avoid huge nested case statements by getting to understand the relationship with the logic, this will allow the data-engine to quickly evaluate the statement regardless as to how Tableau re-writes your seemingly beautiful statement
  • Utilise the RAW_ calclautions wherever possible, these are designed to prevent Tableau from re-writing your query
  • Create and test you calculations against the source to ensure they are optimally performant before moving them into Tableau
  • Where possible, utilise external Python and R for complex calculations or when using a database, try to have complicated expressions built to the server a user-defined calculations to move some of the load from the main engine
  • Only create the calculations you need
  • And Never, be afraid to create multiple calculations that make slight alterations - Tableau will only use materialised calculations in its final query so try to avoid "one-size fit's all calculations"

(2.) Wherever possible ALWAYS use CustomSQL

Hold on a moment, the majority of the articles available online and other Tableau users generally advise to avoid CustomSQL and utilise the Tableau data-model, but you are telling me the exact reverse - why?
 Because it is with CustomSQL...

... that you will benefit from the best experience.

See my article About: Custom SQL an in-depth explanation but CustomSQL offers the best performance: As an SQL analyst, you have the tools to be able to write much more performant code, and to test this code and where necessary, gain advice from others about how best to tune the query whereas, simply dragging and dropping the table objects and relying on Tableau to process the fields and connections, means that no-one will be able to optimise the code.

Also, most BI teams make data available through the use of optimised view's rather than direct access to the table so when Tableau executes its connection such as:

The query that Tableau is sending to the database engine will be a variation of:

Tableau Data Model Query
Select
vw_Property_Sales.*
,vw_Postcode.*
From vw_Property_Sales
Join vw_Postcode On vw_Property_Sales.Postcode = vw_Postcode.Postcode

 this view for example:

will generate a query similar to:

Select
vw_Property_Sales.County
,Sum(1) As [Number of Records]
From vw_Property_Sales
Join vw_Postcode On vw_Property_Sales.Postcode = vw_Postcode.Postcode

Group By vw_Property_Sales.County

But, where view's are used as is the case here, the actual query that needs to be processed by the database engine will actually look more like:

Tableau View Example
Select
tableau.County
,Sum(1) As [Number of Records]

From (
	Select
	vw_Property_Sales.fact_property_id
	,vw_Property_Sales.master_Property_ID
	,vw_Property_Sales.Transfer_Date
	,vw_Property_Sales.type_description As Property_Type
	,vw_Property_Sales.New_Build_Flag
	,vw_Property_Sales.Tenure_Type
	,vw_Property_Sales.Price_Paid
	,vw_Property_Sales.Postcode
	,vw_Property_Sales.Primary_House_Name_Number
	,vw_Property_Sales.Secondary_House_Name_Number
	,vw_Property_Sales.Street
	,vw_Property_Sales.Locality
	,vw_Property_Sales.Town
	,vw_Property_Sales.outcode
	,vw_Property_Sales.District
	,vw_Property_Sales.County
	,vw_Postcode.Postcode_ID
	,vw_Postcode.Postcode
	,vw_Postcode.Outcode
	,vw_Postcode.Street
	,vw_Postcode.Post_Town
	,vw_Postcode.District
	,vw_Postcode.County

	From (
		Select
		p.fact_property_id
		,p.master_Property_ID
		,p.Transfer_Date
		,pt.type_description As Property_Type
		,p.New_Build_Flag
		,p.Tenure_Type
		,p.Price_Paid
		,a.Postcode
		,a.Primary_House_Name_Number
		,a.Secondary_House_Name_Number
		,a.Street
		,a.Locality
		,a.Town
		,a.outcode
		,a.District
		,a.County

		From property.property.fact_property p
		Join property.property.Address_Master a On p.Master_Property_ID = a.Property_ID
		Left Join property.property.dim_property_type pt On p.property_type = pt.property_type
	)vw_Property_Sales
	Join (
		Select
		p.Postcode_ID
		,p.Postcode
		,o.Outcode
		,p.Street
		,o.Post_Town
		,o.District
		,o.County

		From postcode.Street_Postcode p
		Join postcode.Outcode_Master o On p.Outcode = o.Outcode
	)vw_Postcode On vw_Property_Sales.Postcode = vw_Postcode.Postcode
)tableau

Group By tableau.County

Which you can see is functionally identical to oft quoted reasons for NOT using CustomSQL.

However, two major points not covered which are generally the most important reasons for using CustomSQL:

  1. Notice that although Tableau is selective as to what data it is requesting, the view unfortunately will need to return all the data and perform in-memory filtering to get to the required data as a view, is simply the result-set of a query
    1. You on the other hand will be able to alter the selection in your query to get only the data you need to satisfy your requirements, even if this means creating multiple bespoke connections; keep in mind that a connection is nothing more than a semantc layer, and will only be made when the data is required from that connection.
  2. Parameterised queries: This neat little function allows for user selection driven data or rather, being able to write dynamic queries that change based on the selections of parameters, means you can create dynamic case statements, have a much elective where clause, filtering the source data before other filters are applied on the final result-set etc.

In short, CustomSQL allows you to harness the best from your database with the least amount of impact and to be able better control you data

(3.) Avoid nesting and materialise wherever possible

Ok, I think I understand this one, instead of writing nested queries, you suggest I should create separate calculations for what would be the nested calculations? Why, surely this will lead to more processing
 Sure does but only..

... a marginal rise, although, there are two more important reasons for this:

1.Reading point 1 above, Tableau has a tendency to convert even the most simplest of queries into considerably more complicated queries which can, and generally speaking do impact data-server resource, and more frustratingly, cause the data server to disregard indexes and other performance optimisations that may have nee applied to the table, so writing individual calculations, will cause these to be processed in a separate transaction.

2.Secondly, nesting calculations rather than pre-materialising the outputs can lead to calculation errors, as Tableau is only able to consider the data in the view as being the partition eg:

Assume you are wanting the average sum across the complete set of 200M rows of data, but your sample contains only 20,000, this average could therefore be incorrect. Now, this could be easily resolved with a correctly-scoped level-of-detail calculation, however, this would mean your calculation is unnecessarily unique, making your documentation unnecessarily complicated too 

(4.) Use Tableau built-in functions (aliases, groups, sets, bins etc) wherever possible instead of calculations or make the changes in CustomSQL

But calculations are easier and aliases / groups / bins etc may be single-use

 I can't argue with either of those points however...

... consider this calculation to alter the alias of Tenure Type : (running over a 30M row optimised set:

CalcTableau InterpretedResults in this plan
Case [Tenure Type]
    When 'F' Then 'Freehold'
    When 'L' Then 'Leasehold'
    Else 'Unknown'
End
SELECT (CASE [vw_Property_Sales].[Tenure_Type] WHEN 'F' THEN 'Freehold' WHEN 'L' THEN 'Leasehold' ELSE 'Unknown' END) AS [Calculation_1483936116037431300]
FROM [dbo].[vw_Property_Sales] [vw_Property_Sales]
WHERE ([vw_Property_Sales].[Tenure_Type] = 'F')
GROUP BY (CASE [vw_Property_Sales].[Tenure_Type] WHEN 'F' THEN 'Freehold' WHEN 'L' THEN 'Leasehold' ELSE 'Unknown' END)

Where aliases, being a Tableau object result in no execution time in order to generate the outputs, result is, far faster performance and zero impact to the underlying source.

This object:

is the generally the cause of over 9 in 10 bottle-necks, slowdowns and server impacts yet is also one of the most easiest you can control.

(5.) Always test the performance using the performance recorder

The performance recorder will tell you everything you need to know about what Tableau and thereby, the data-server is needing to do to build your visual. It can be accessed from Help > Settings and Performance > Start Performance Recording to start recording actions, and then returning to the same location and selecting Stop Performance Recording when you are ready to review.

I would recommend you also use the Tableau Server performance recorder when publishing extracts after you have tuned using the desktop version, to ensure your server and web pipeline are also optimal. This is reached by first activating the toolbar with this url command:

Open the workbook you want to test onto the first chart you want eg:

https://tableau.com/#/divisionsite/financeanalytics/views/fiscal_performance/HighLevelSummary?:iid=1

and replace the ":iid=1" with ":record_performance=yes"

Once active, you will be able to 



Data drives everything.

Including your reports, but data is not always optimised for the job in-hand, and is often unable to be, so really you should see data and data performance as the main drivers behind your reports, don't try and over-wrangle Tableau.

So this where the following guide steps in, and a little more information can be found on the following pages:







  • No labels