Content Comparison

...

Expand

title	<...>

Sections 1 & 2 model answers:

Question 1:

Paste code macro

language	sql

Select
Customer_Name
,Count(1) As Order_Count

From dbo.Orders

Group By Customer_Name

Having Count(1) > 1

Question 2:

Rank() or Row_Number() will help you here

Paste code macro

language	sql

Select
a.Customer_Name
,a.Order_ID
,a.Order_Date
,a.Order_Value

From (
   Select
   Customer_Name
   ,Order_ID
   ,Order_Date
   ,Order_Value
   ,Row_Number() Over(Partition By Customer_Name Order By Order_Date Asc) As first_sq
   ,Row_Number() Over(Partition By Customer_Name Order By Order_Date Desc) As last_sq

   From dbo.Orders
)a
Where a.first_sq = 1 Or a.last_sq = 1

Question 3:

The better method will be:

Paste code macro

language	sql

Select
a.Customer_Name
,a.Order_Value As Latest_Value
,a.First_Order As Earliest_Value
,((Nullif(a.Order_Value,0) - NullIf(a.First_Order,0)) / NullIf(a.First_Order,0)) * 100 As Perc_Diff

From (
	Select
	Customer_Name
	,Order_Date
	,Order_Value
	,Lead(Order_Value,1,0) Over(Partition By Customer_Name Order By Order_Date Desc) As First_Order
	,Row_Number() Over(Partition By Customer_Name Order By Order_Date Desc) As Customer_Sq

	From ResultSet
)a
Where a.Customer_sq = 1

Order By a.Customer_Name

The above method will allow you to identify the first and last orders rather than simply the highest and lowest, this helps with understanding whether performance has increased or decreased between the two orders, but, for a simple max vs min, you could also use:

Paste code macro

language	sql

Select
Customer_Name
,Max(Order_Value) As Max_Value
,Min(Order_Value) As Min_Value
,((Max(Order_Value) - Min(Order_Value)) / Min(Order_Value))*100 As perc_diff

From Resultset
Group By
Customer_Name

Question 4:

Query 2 will be far more performant than query 1 as the index on the Order_Date field will be able to be used, whereas Query 1 will prevent any index from being used.

Performing the function on any stored field (table field) at any point outside the projection, will cause the db engine to need to ‘touch’ every record in the table to update the data in that field. As such, no index can be used, so if your table has 1 trillion records, but the date field would return just 500,000 records, all 1 trillion records will need to be processed.

It is far better, to load the records you need to a temporary source or derived table, adding the filters that can be applied without altering the stored data, and performing the function into the projection.
In so doing, the index for the date field can be used to build the temporary table, and you perform the upper-case filter on the outer table.

Question 5a:

Execution Order

From

Join

Where

Select

Group By

Having

Order By

Question 5b:

Unions stack sets of data one atop the other to for a single unified output. The field names and data-types of the first result-set define the table (whether real or virtual - in temp memory).

Union	Union All
A union filters the sets as they are being stacked ensuring each row in the output is unique.	Union All will not filter the stacks for uniqueness, so when unioning the output of the same table, (assume it has 1000 rows), where a Union will only consist of 1000 rows, the Union All will consist of 2000 rows

Join	Explantion
Inner	Returns all data from both tables where a related field exists in both tables
Left	Returns all data from the primary table, and only data from the secondary table where the related field exists. All other fields are nulled
Right	Functionally the same as a left join, just reversed
Full	Is identical to a left and right join, all data from both tables is returned. Where a related field exists in both tables, the row shall be complete
Cross	All rows of both tables are joined to one another with no join clause. Often cited as great for creating test data and little else. However, with a little creativity, this can generate null entries for when data does not exist rather than simply being filtered away, such as if no sales data exists for a customer

Question 6:

This is more abstract and has no real definitive answer given the lack of information. That said, I would probably look at the numbers of statuses each transaction generates, then perform a count of transactions for the past few months, identified a possible multiplier for percentage growth or shrinkage, then apply this to get an idea as to how many transactions we could forecast this month. Then we simply multiply the forecast by the numbers of statuses to get an idea as to the estimated number of rows we would expect the log to hold.

Question 7:

Problem	Possible Resolution
Using calculations to adjust attribute names	Use Aliases
Including a fixed title in a calculation	Use textboxes
Using calculations for grouping	Use the group field
Using nested calculations	Create calculations and use the results in forward calcs
Quickfilters	Try to limit to less than 6 quickfilters and/or make use of parameters

Despite the above, arguably the best action a developer can take will be to identify the source of the bottleneck, which can only really be achieved through analysis of the SQL that Tableau has generated and sent to the data-server for processing.

There are two ways of getting to this, either use the Performance Recorder or, access the Tableau logs, although I tend to use both: the performance recorder to get the initial analysis underway followed by the logs, following this pathway:

(users do not need to follow this, although this method will make their lives easier)

Open the workbook
Create a new blank worksheet
Unhide ALL worksheets, ensuring you remain on or navigate back to the blank worksheet
Save, and close.

We have done this so that when we re-open the workbook, we can get straight on with the analysis rather than waiting for the workbook to render.

I would recommend waiting around 30 mins at this point if possible, to allow the data server time to clear its query cache.

Navigate to your Tableau logs: (default) Documents\My Tableau Repository\Logs and delete or move them.
Now re-open the workbook, activate the performance recorder, open the newly created log into a text editor such as Notepad++, Sublime, Visual Studio etc, be ready to keep refreshing
Move through each sheet, allowing time to render, so the queries can be captured both to the recorder (which is currently generating its own temp log), and to the main log

At the completion of this exercise, the user can use both the log and the performance recorder to try to identify the causes of the bottle-necks.

It’s a good idea to test the generated SQL with the show/explain plan so we can get an understanding as to how the workbook is impacting the server; this is more often than not, the usual cause of bottlenecks, from here, users can:

Identify possible table and field optimisations to ensure the data can be retrieved quicker
Additional additive and semi-additive fields that can be added to the table such as logic expressions, and numerators and denominators
Identify very complex queries resulting from equally complex calculations, ready to optimise and/or rebuild in the front-end

Question 8:

The Table Calculation is the most performant. But the use case needs to be considered too:

Table Calculations are ultimately more performant than Level-of-Detail Expressions, although LODx are generally easier to use, as the scoping information is contained within the expression, and, LODx can offer much more flexibility over Table Calcs.

The TC will perform a basic summation, with the scoping handled by Tableau, whereas the LODx must pull through the row-level detail and, create one-or-more derived tables that handle the grouping before then joining the aggregated results back to the main row-data.

As Tableau converts your expressions to SQL queries, knowing the expected queries can help us greatly reduce our performance bottlenecks.

These are the two queries generated by the calculations:

Note
Tableau is pre-programmed to write SQL to pre-defined templates, such that the SQL produced may not be as performant as: when you have defined it, and may perform additional unneccessary actions , that we cannot directly adjust, that ; and can greatly affect server performance

Table Calculation

Level-of-detail Expression

Tableau:

Code Block
Window_Sum(Sum(Sales))

SQL:

Code Block
Select Segment ,Category ,Region ,Sum(Sales) As Sales From Superstore Where Segment = 'Consumer' Group By Segment, Category, Region

Tableau:

Code Block
Sum({Fixed Segment, Category : Sum(Sales)})

SQL:

Code Block

Select
t1.Segment
,t1.Category
,Sum(t2.Sales) As Sales

From (
	Select
	Segment
	,Category
	,Region

	From Superstore
	Where Segment = 'Consumer'
	Group By Segment, Category, Region
)t1

Join (
	Select
	Segment
	,Category
	,Sum(Sales) As Sales

	From Superstore
	Group By Segment, Category
)t2

On t1.Segment = t2.Segment And t1.Category = t2.Category

Group By 
t1.Segment, t1.Category

Question 9:

The better method here will be CustomSQL (cSQL).

Tableau has a really fancy method called ‘join-culling’, so when a large model like this is loaded, Tableau analyses the model to determine where the data is, and generate a query that only uses the necessary tables needed to get the data, effectively ignoring or culling other unnecessary tables.

This can produce really high-performant queries and is great for when users want to analyse different parts of the model in the same workbook, however:

Note
Relational Integrity needs to be defined between each of the tables for Tableau to utilise join-culling

If RI is not defined – as is more often the case in highly denormalised reporting databases and warehouses (as explained abovein the question), then Tableau must process the entirety of the model: every table and every join, and return every field from every table in the model.

Furthermore, users are unable to optimise the outgoing model, whereas the cSQL is functionally the same as a view, so whilst Tableau needs to run the same query for every calculation that is used, as the SQL analyst, you can optimise the query before you add it to the Tableau data-model.

Tip

In an ideal world, the best option would be to create a view, but as views are permanent objects, users usually require DDL (Data Definition Language) permissions to create them.

DDL permissions often extend to db-admin functions too as such, ddl permissions are more often denied thus preventing developers from being able to create views.

...

Version	Old Version 28	New Version 29
Changes made by	Steve Martin	Steve Martin
Saved on	13 Apr, 2020	13 Apr, 2020

Versions Compared

Key