Computing Compound Aggregates in T-SQL for SSRS
Sometimes it is easier and more flexible to calculate Aggregate values in T-SQL prior to sending the data to SSRS for formatting and output. SQL Server is excellent at computing many types of aggregate values such as SUM. COUNT, and AVG.
T-SQL really shows its value when aggregates of aggregates are needed. Frankly I’m not even sure if it is possible to do this in SSRS. I suspect it is but I’m willing to wager this involves a lot of painful steps. It may be easier in SSRS 2008 but I’m basing my statement on my work and occasional frustration in SSRS on 2005.
This post is framed with the intention of making SSRS easier to work with but the techniques illustrated in the sample code would apply equally to the needs in T-SQL as well with the exception that the data may not be needed in 2NF as it is for use with SSRS.
The three code segments show several techniques for accomplishing compound aggregates including scenarios where filtering is required for some of the aggregate calculations which makes it even more complicated to compute in SSRS.
Example #1: Just setting up some sample data in a temp table. Not very interesting but feel free to manipulate to change the nature of the sample data to fit your needs.
Example #2: This is where we get to have some fun with my beloved CTEs. Using multiple CTEs allows for the calculation of different aggregate values and the re-composition of them at then final output.
This technique would also work in prior versions of SQL server using derived tables instead of CTEs. However, the third example may be a better choice if you have a need to reuse previously computed aggregates to avoid repeating the same derived table query multiple times. Repeating derived tables is difficult to maintain and has performance impact.
Example #3: Same functionality of the code in Example #2 but implemented using Temp Tables. There are as always pros and cons of using temp tables. In this case some of the reason to consider using temp tables are as follows:
- Easy to follow
- Can be indexed for large sets of data for higher performance
- Easy to modify if additional calculations are needed
The downside to the compound aggregate solution in example #3 in addition to any issues with temp tables is that it relies on multiple update statements. Updates are expensive so when calculating aggregates it is best to combine as many like grouped aggregates together into single update statements. Holding intermediate results of aggregates that get reused in temp tables can help optimize performance and eliminate mistakes due to duplicated code.
General Requirements
- Calculate aggregates of aggregates
- Include range based filtering
- Use parameters to impact calculation and filtering criteria
- Show various techniques for conditional aggregation
- Use calculations and derived data in addition to the aggregate functions
- Implement a non trivial example
{Notes:
Bonus points if you can point out the missing query optimizations
This post is based on a question in a Philadelphia SQL Server Users Group email distribution list. I have to admit that I did not understand all of the requirements of the question but hope that this post has enough content and useable code so that people can derive their own answers based on these examples.
These examples are just that examples. Multiple techniques were illustrated to show how something can be done. Feel free to point out errors in the code but keep in mind that some things here are not the optimal solution on purpose. Apply the techniques to your own work to meet your requirements.
}
Example #1
USE tempdb IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL DROP TABLE #MyTable ------------------------------------------------------------------------------- -- Create Sample Data ------------------------------------------------------------------------------- CREATE TABLE #MyTable ( ID_MyTable INT IDENTITY(1,1) NOT NULL , ID_Ref INT NOT NULL , SomeData VARCHAR(50) NOT NULL , CreateDate DATETIME NOT NULL , TouchDate DATETIME NOT NULL ) INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Mercury', '1/12/2009', '2/12/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Venus', '2/13/2009', '4/20/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Earth', '3/14/2009', '7/22/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'The Moon', '4/15/2009', '4/23/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Mars', '5/16/2009', '9/12/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Jupiter', '6/17/2009', '11/17/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'IO', '7/18/2009', '8/4/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Europa', '8/19/2009', '9/12/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Ganymede', '9/20/2009', '11/30/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Callisto', '10/21/2009', '11/15/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Saturn', '11/22/2009', '12/04/2009') INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Neptune', '12/23/2009', '12/23/2009') --------------------------------------- -- Even more sample data --------------------------------------- INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) SELECT MT.ID_Ref*2 AS ID_Ref , MT.SomeData + ' ' + CAST(MT.ID_Ref * 2 AS VARCHAR(10)) AS SomeData , DATEADD(dd,-17, MT.CreateDate) AS CreateDate , DATEADD(dd, 17, MT.TouchDate) AS TouchDate FROM #MyTable MT WHERE MT.ID_MyTable % 2 = 0 UNION ALL SELECT MT.ID_Ref*3 AS ID_Ref , MT.SomeData + ' ' + CAST(MT.ID_Ref * 3 AS VARCHAR(10)) AS SomeData , DATEADD(dd, -9, MT.CreateDate) AS CreateDate , DATEADD(dd, 22, MT.TouchDate) AS TouchDate FROM #MyTable MT WHERE MT.ID_MyTable % 2 = 1
Example #2
------------------------------------------------------------------------------- -- Set up variables ------------------------------------------------------------------------------- DECLARE @StartRange DATETIME DECLARE @EndRange DATETIME DECLARE @NumberOfDaysNotTouched INT DECLARE @NumberOfDaysBetweenCreateAndTouch INT SET @StartRange = '3/1/2009' SET @EndRange = '10/31/2009' SET @NumberOfDaysNotTouched = 45 SET @NumberOfDaysBetweenCreateAndTouch = 60 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- -- Compound Aggregates Using CTE ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- --------------------------------------- -- Sample of DataToCareAbout --------------------------------------- SELECT MT.ID_MyTable , MT.ID_Ref , MT.SomeData , MT.CreateDate , MT.TouchDate , DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched , DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch FROM #MyTable MT WHERE MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate) AND ( DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate OR DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch ) --------------------------------------- -- Compound Aggregate query --------------------------------------- ; WITH DataToCareAbout AS ( SELECT MT.ID_MyTable , MT.ID_Ref , MT.SomeData , MT.CreateDate , MT.TouchDate -- Row based calculations to make it easier to consume downstream , DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched , DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch FROM #MyTable MT WHERE MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate) AND ( DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate OR DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch ) ) , AggregateData AS ( -- Basic Aggregate data -- lots of cast to Float to allow more precision in math downstream SELECT DTC.ID_Ref -- The next line is included for illustration & to make math easier downstream , (SELECT CAST(COUNT(*) AS FLOAT) FROM DataToCareAbout) TotalRecCount , CAST(COUNT(*) AS FLOAT) TotalRecsPerRef , CAST(SUM( CASE WHEN DTC.NumberOfDaysNotTouched >= @NumberOfDaysNotTouched THEN 1 ELSE 0 END) AS FLOAT) AS NumberOfRecordsOverDaysNotTouchedThreshold , CAST(SUM( CASE WHEN DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch THEN 1 ELSE 0 END) AS FLOAT)AS NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold FROM DataToCareAbout DTC GROUP BY DTC.ID_Ref ) , AggregateData_AVG_DaysNotTouched AS ( -- Basic Aggregate data but with filtering that affects calculated values SELECT DTC.ID_Ref , AVG(DTC.NumberOfDaysNotTouched) AS AVG_NumberOfDaysNotTouched_OverThreshold FROM DataToCareAbout DTC WHERE DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch GROUP BY DTC.ID_Ref ) , AggregateData_AVG_DaysBetweenCreateAndTouch AS ( -- Basic Aggregate data but with filtering that affects calculated values SELECT DTC.ID_Ref , AVG(DTC.NumberOfDaysBetweenCreateAndTouch) AS AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold FROM DataToCareAbout DTC WHERE DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch GROUP BY DTC.ID_Ref ) , AggregateData_AllRecords AS ( SELECT ROUND(AVG(AD.TotalRecsPerRef),2) AS AVG_RecsPerRef , ROUND(AVG(AD.NumberOfRecordsOverDaysNotTouchedThreshold),2) AS AVG_RecordsOverDaysNotTouchedThreshold , ROUND(AVG(AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold),2) AS AVG_RecordsOverDaysBetweenCreateAndTouchThreshold FROM AggregateData AD ) SELECT DTC.ID_MyTable , DTC.ID_Ref , DTC.SomeData , DTC.CreateDate , DTC.TouchDate , DTC.NumberOfDaysNotTouched , DTC.NumberOfDaysBetweenCreateAndTouch , ROUND((AD.TotalRecsPerRef / AD.TotalRecCount),2) AS PercentageOf_RecsAllRefs , ROUND((AD.NumberOfRecordsOverDaysNotTouchedThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef , ROUND((AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef , A_DNT.AVG_NumberOfDaysNotTouched_OverThreshold , A_CT.AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold , AD.TotalRecCount , AD.TotalRecsPerRef , AD.NumberOfRecordsOverDaysNotTouchedThreshold , AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold , AD_All.AVG_RecsPerRef , AD_All.AVG_RecordsOverDaysNotTouchedThreshold , AD_All.AVG_RecordsOverDaysBetweenCreateAndTouchThreshold FROM DataToCareAbout DTC INNER JOIN AggregateData AD ON DTC.ID_Ref = AD.ID_Ref CROSS JOIN AggregateData_AllRecords AD_All LEFT OUTER JOIN -- LOJ Needed since filtering could have eliminated keys AggregateData_AVG_DaysNotTouched A_DNT ON DTC.ID_Ref = A_DNT.ID_Ref LEFT OUTER JOIN -- LOJ Needed since filtering could have eliminated keys AggregateData_AVG_DaysBetweenCreateAndTouch A_CT ON DTC.ID_Ref = A_CT.ID_Ref
Example #3
------------------------------------------------------------------------------- -- Set up variables ------------------------------------------------------------------------------- DECLARE @StartRange DATETIME DECLARE @EndRange DATETIME DECLARE @NumberOfDaysNotTouched INT DECLARE @NumberOfDaysBetweenCreateAndTouch INT SET @StartRange = '3/1/2009' SET @EndRange = '10/31/2009' SET @NumberOfDaysNotTouched = 45 SET @NumberOfDaysBetweenCreateAndTouch = 60 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- -- Compound Aggregates Using Temp Tables ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- IF OBJECT_ID('tempdb..#DataToCareAbout') IS NOT NULL DROP TABLE #DataToCareAbout IF OBJECT_ID('tempdb..#MyOutputRecords') IS NOT NULL DROP TABLE #MyOutputRecords IF OBJECT_ID('tempdb..#AggregateData') IS NOT NULL DROP TABLE #AggregateData CREATE TABLE #DataToCareAbout( ID_MyTable INT NOT NULL , ID_Ref INT NOT NULL , SomeData VARCHAR(50) NOT NULL , CreateDate DATETIME NOT NULL , TouchDate DATETIME NOT NULL , NumberOfDaysNotTouched INT NOT NULL , NumberOfDaysBetweenCreateAndTouch INT NOT NULL ) CREATE TABLE #AggregateData ( ID_Ref INT NOT NULL , TotalRecCount FLOAT NOT NULL , TotalRecsPerRef FLOAT NOT NULL , NumberOfRecordsOverDaysNotTouchedThreshold FLOAT NOT NULL , NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold FLOAT NOT NULL ) CREATE TABLE #MyOutputRecords ( ID_MyTable INT NOT NULL , ID_Ref INT NOT NULL , SomeData VARCHAR(50) NOT NULL , CreateDate DATETIME NOT NULL , TouchDate DATETIME NOT NULL , NumberOfDaysNotTouched INT NOT NULL , NumberOfDaysBetweenCreateAndTouch INT NOT NULL , PercentageOf_RecsAllRefs NUMERIC(8,2) NOT NULL , PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef NUMERIC(8,2) NOT NULL , PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef NUMERIC(8,2) NOT NULL , AVG_NumberOfDaysNotTouched_OverThreshold NUMERIC(8,2) NULL , AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold NUMERIC(8,2) NULL , TotalRecCount INT NOT NULL , TotalRecsPerRef INT NOT NULL , NumberOfRecordsOverDaysNotTouchedThreshold INT NOT NULL , NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold INT NOT NULL , AVG_RecsPerRef NUMERIC(8,2) NULL , AVG_RecordsOverDaysNotTouchedThreshold NUMERIC(8,2) NULL , AVG_RecordsOverDaysBetweenCreateAndTouchThreshold NUMERIC(8,2) NULL ) ------------------------------------------------------------------------------- -- Get data to care about ------------------------------------------------------------------------------- INSERT INTO #DataToCareAbout( ID_MyTable , ID_Ref , SomeData , CreateDate , TouchDate , NumberOfDaysNotTouched , NumberOfDaysBetweenCreateAndTouch ) SELECT MT.ID_MyTable , MT.ID_Ref , MT.SomeData , MT.CreateDate , MT.TouchDate -- Row based calculations to make it easier to consume downstream , DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched , DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch FROM #MyTable MT WHERE MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate) AND ( DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate OR DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch ) ORDER BY MT.ID_Ref , MT.ID_MyTable ------------------------------------------------------------------------------- -- Calculate Aggregate Data ------------------------------------------------------------------------------- INSERT INTO #AggregateData ( ID_Ref , TotalRecCount , TotalRecsPerRef , NumberOfRecordsOverDaysNotTouchedThreshold , NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold ) SELECT DTC.ID_Ref -- The next line is included for illustration & to make math easier downstream , (SELECT CAST(COUNT(*) AS FLOAT) FROM #DataToCareAbout) TotalRecCount , CAST(COUNT(*) AS FLOAT) TotalRecsPerRef , CAST(SUM( CASE WHEN DTC.NumberOfDaysNotTouched >= @NumberOfDaysNotTouched THEN 1 ELSE 0 END) AS FLOAT) AS NumberOfRecordsOverDaysNotTouchedThreshold , CAST(SUM( CASE WHEN DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch THEN 1 ELSE 0 END) AS FLOAT)AS NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold FROM #DataToCareAbout DTC GROUP BY DTC.ID_Ref ORDER BY DTC.ID_Ref ------------------------------------------------------------------------------- -- build output table even though not all the data is available yet ------------------------------------------------------------------------------- INSERT INTO #MyOutputRecords ( ID_MyTable , ID_Ref , SomeData , CreateDate , TouchDate , NumberOfDaysNotTouched , NumberOfDaysBetweenCreateAndTouch , TotalRecCount , TotalRecsPerRef , NumberOfRecordsOverDaysNotTouchedThreshold , NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold , PercentageOf_RecsAllRefs , PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef , PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef ) SELECT dtca.ID_MyTable , dtca.ID_Ref , dtca.SomeData , dtca.CreateDate , dtca.TouchDate , dtca.NumberOfDaysNotTouched , dtca.NumberOfDaysBetweenCreateAndTouch , AD.TotalRecCount , AD.TotalRecsPerRef , AD.NumberOfRecordsOverDaysNotTouchedThreshold , AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold , ROUND((AD.TotalRecsPerRef / AD.TotalRecCount),2) AS PercentageOf_RecsAllRefs , ROUND((AD.NumberOfRecordsOverDaysNotTouchedThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef , ROUND((AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef FROM #DataToCareAbout AS dtca INNER JOIN #AggregateData AS AD ON dtca.ID_Ref = AD.ID_Ref ORDER BY dtca.ID_Ref , dtca.ID_MyTable ------------------------------------------------------------------------------- -- Update Output with missing values ------------------------------------------------------------------------------- --------------------------------------- -- AVG_NumberOfDaysNotTouched_OverThreshold --------------------------------------- ; WITH AVG_Data AS ( -- Basic Aggregate data but with filtering that affects calculated values SELECT DTC.ID_Ref , AVG(DTC.NumberOfDaysNotTouched) AS AVG_NumberOfDaysNotTouched_OverThreshold FROM #DataToCareAbout DTC WHERE DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch GROUP BY DTC.ID_Ref ) UPDATE #MyOutputRecords SET AVG_NumberOfDaysNotTouched_OverThreshold = AVG_Data.AVG_NumberOfDaysNotTouched_OverThreshold FROM #MyOutputRecords MOR INNER JOIN AVG_Data ON MOR.ID_Ref = AVG_Data.ID_Ref --------------------------------------- -- AVG_NumberOfDaysNotTouched_OverThreshold --------------------------------------- ; WITH AVG_Data AS ( -- Basic Aggregate data but with filtering that affects calculated values SELECT DTC.ID_Ref , AVG(DTC.NumberOfDaysBetweenCreateAndTouch) AS AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold FROM #DataToCareAbout DTC WHERE DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch GROUP BY DTC.ID_Ref ) UPDATE #MyOutputRecords SET AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold = AVG_Data.AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold FROM #MyOutputRecords MOR INNER JOIN AVG_Data ON MOR.ID_Ref = AVG_Data.ID_Ref --------------------------------------- -- All Records Aggregate Data --------------------------------------- ; WITH AggregateData_AllRecords AS ( SELECT ROUND(AVG(AD.TotalRecsPerRef),2) AS AVG_RecsPerRef , ROUND(AVG(AD.NumberOfRecordsOverDaysNotTouchedThreshold),2) AS AVG_RecordsOverDaysNotTouchedThreshold , ROUND(AVG(AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold),2) AS AVG_RecordsOverDaysBetweenCreateAndTouchThreshold FROM #AggregateData AD ) UPDATE #MyOutputRecords SET AVG_RecsPerRef = AR.AVG_RecsPerRef , AVG_RecordsOverDaysNotTouchedThreshold = AR.AVG_RecordsOverDaysNotTouchedThreshold , AVG_RecordsOverDaysBetweenCreateAndTouchThreshold = AR.AVG_RecordsOverDaysBetweenCreateAndTouchThreshold FROM #MyOutputRecords MOR CROSS JOIN AggregateData_AllRecords AR ------------------------------------------------------------------------------- -- Output to SSRS ------------------------------------------------------------------------------- SELECT * FROM #MyOutputRecords
Reader Comments