Its a sad fact of life but Tableau will probably not be around foreverCertainly as your company seeks out newer tools and data continues to grow exponentiallyThe problem is, your company knows this and has built, and continues to build systems in order to progress |
It is very rare, and in most cases, improbable that your Tableau reports and analyse's will consume data from a dedicated for-Tableau optimised data-warehouse, at best, you can hope that a warehouse for visualisation be made available, although most of the time, you will be consuming the same data as everyone else, such that unless you are a member of the BI team, or have ddl (Data Definition Language) access to the databases (required to create and optimise tables, views, etl etc), the chances are, you may not have a chance to optimise the back-end.
Best practice for performance is key to everything we do as Tableau developers; if our reports slow-down or, our reports create bottle-necks further down the chain, you can be guaranteed of the following (in this order):
|
But calculations are Tableau's bread-and-butter aren't they? What can I possibly do if I cannot write calculations?
... you will need to create calculations in order for you to provide your analyses however, there are right ways and wrong ways to create calculations with the wrong ways having a detrimental effect to the back-end; however, only through the of the performance recorder will you know for sure the true impacts that you calculations are having. Classically, consider this calculation:
This is the query and execution plan from SQL Server: And now in Tableau:
Notice the additional operation in the plan? In this instance, this has resulted in a minimal change on a 9994 row set, but can you imagine the impact this could have on a million / billion or even trillion row set? Moreover, this example uses a table with no performance enhancements which has resulted in a full table scan and hash-match in both queries but crucially, if the table was indexed, the first query would likely be able to use the index to reduce the number of rows to be processed whereas the resultant query Tableau has produced, would mean the index would discarded and a full table scan would be required. Moral of the story, yes, calculations are needed, you will be lost without them, but, just ensure the calculations you write perform to their best with minimal impact eg:
|
Hold on a moment, the majority of the articles available online and other Tableau users generally advise to avoid CustomSQL and utilise the Tableau data-model, but you are telling me the exact reverse - why?
... that you will benefit from the best experience. See my article About: Custom SQL an in-depth explanation but CustomSQL offers the best performance: As an SQL analyst, you have the tools to be able to write much more performant code, and to test this code and where necessary, gain advice from others about how best to tune the query whereas, simply dragging and dropping the table objects and relying on Tableau to process the fields and connections, means that no-one will be able to optimise the code. Also, most BI teams make data available through the use of optimised view's rather than direct access to the table so when Tableau executes its connection such as: The query that Tableau is sending to the database engine will be a variation of:
this view for example: will generate a query similar to:
But, where view's are used as is the case here, the actual query that needs to be processed by the database engine will actually look more like:
Which you can see is functionally identical to oft quoted reasons for NOT using CustomSQL. However, two major points not covered which are generally the most important reasons for using CustomSQL:
In short, CustomSQL allows you to harness the best from your database with the least amount of impact and to be able better control you data |
Ok, I think I understand this one, instead of writing nested queries, you suggest I should create separate calculations for what would be the nested calculations? Why, surely this will lead to more processing
... a marginal rise, although, there are two more important reasons for this: 1.Reading point 1 above, Tableau has a tendency to convert even the most simplest of queries into considerably more complicated queries which can, and generally speaking do impact data-server resource, and more frustratingly, cause the data server to disregard indexes and other performance optimisations that may have nee applied to the table, so writing individual calculations, will cause these to be processed in a separate transaction. 2.Secondly, nesting calculations rather than pre-materialising the outputs can lead to calculation errors, as Tableau is only able to consider the data in the view as being the partition eg: Assume you are wanting the average sum across the complete set of 200M rows of data, but your sample contains only 20,000, this average could therefore be incorrect. Now, this could be easily resolved with a correctly-scoped level-of-detail calculation, however, this would mean your calculation is unnecessarily unique, making your documentation unnecessarily complicated too |
But calculations are easier and aliases / groups / bins etc may be single-use
... consider this calculation to alter the alias of Tenure Type : (running over a 30M row optimised set:
Where aliases, being a Tableau object result in no execution time in order to generate the outputs, result is, far faster performance and zero impact to the underlying source. This object: is the generally the cause of over 9 in 10 bottle-necks, slowdowns and server impacts yet is also one of the most easiest you can control. |
The recorder simply records all Tableau events to help you identify performance issues. The recorder is NOT a screen-capture device: it will not record screen activity, webcam or microphone and all recorded activity is saved locally. |
... will tell you everything you need to know about what Tableau and thereby, the data-server is needing to do to build your visual. It can be accessed from Help > Settings and Performance > Start Performance Recording to start recording actions, and then returning to the same location and selecting Stop Performance Recording when you are ready to review. I would recommend you also use the Tableau Server performance recorder when publishing extracts after you have tuned using the desktop version, to ensure your server and web pipeline are also optimal. This is reached by first activating the toolbar with this url command: Open the workbook you want to test onto the first chart you want eg: https://tableau.com/#/divisionsite/financeanalytics/views/fiscal_performance/HighLevelSummary?:iid=1 and replace the ":iid=1" with ":record_performance=yes" Once active, you will be able to start and stop recording at will and recording is continuous unlike the desktop recorder where only actions that take place between explicit start and stop points are recorded, allowing for better traceback facility. |
Understanding usage frequency, data-size and complexity are the three considerations you need to make when choosing the best connection method. Generally speaking the Direct connection is often the best as you can ensure:
Although, this too has other considerations:
Hyper extracts by comparison offer a safe medium however, by offering mostly the reverse of the direct connect method. However, keep in mind that extracts:
Finally, Published extracts. Published extracts can exist as either extracted data-sets like Hyper Extracts or can simply be a published data-source holding the connectivity information to a Direct-connect data-source. They function near identical to their workbook-base counterparts only being server based, they provide the addition of accessibility allowing all users to connect and use the same source as you are using. But, published extracts and sources are not also without their issues and should still only be used following careful consideration.
|
... should exist in the source. |
Including your reports, but data is not always optimised for the job in-hand, and is often unable to be, so really you should see data and data performance as the main drivers behind your reports, don't try and over-wrangle Tableau.
So this where the following guide steps in, and a little more information can be found on the following pages: