Sometimes it is easier and more flexible to calculate Aggregate values in T-SQL prior to sending the data to SSRS for formatting and output. SQL Server is excellent at computing many types of aggregate values such as SUM. COUNT, and AVG.
T-SQL really shows its value when aggregates of aggregates are needed. Frankly I’m not even sure if it is possible to do this in SSRS. I suspect it is but I’m willing to wager this involves a lot of painful steps. It may be easier in SSRS 2008 but I’m basing my statement on my work and occasional frustration in SSRS on 2005.
This post is framed with the intention of making SSRS easier to work with but the techniques illustrated in the sample code would apply equally to the needs in T-SQL as well with the exception that the data may not be needed in 2NF as it is for use with SSRS.
The three code segments show several techniques for accomplishing compound aggregates including scenarios where filtering is required for some of the aggregate calculations which makes it even more complicated to compute in SSRS.
Example #1: Just setting up some sample data in a temp table. Not very interesting but feel free to manipulate to change the nature of the sample data to fit your needs.
Example #2: This is where we get to have some fun with my beloved CTEs. Using multiple CTEs allows for the calculation of different aggregate values and the re-composition of them at then final output.
This technique would also work in prior versions of SQL server using derived tables instead of CTEs. However, the third example may be a better choice if you have a need to reuse previously computed aggregates to avoid repeating the same derived table query multiple times. Repeating derived tables is difficult to maintain and has performance impact.
Example #3: Same functionality of the code in Example #2 but implemented using Temp Tables. There are as always pros and cons of using temp tables. In this case some of the reason to consider using temp tables are as follows:
- Easy to follow
- Can be indexed for large sets of data for higher performance
- Easy to modify if additional calculations are needed
The downside to the compound aggregate solution in example #3 in addition to any issues with temp tables is that it relies on multiple update statements. Updates are expensive so when calculating aggregates it is best to combine as many like grouped aggregates together into single update statements. Holding intermediate results of aggregates that get reused in temp tables can help optimize performance and eliminate mistakes due to duplicated code.
General Requirements
- Calculate aggregates of aggregates
- Include range based filtering
- Use parameters to impact calculation and filtering criteria
- Show various techniques for conditional aggregation
- Use calculations and derived data in addition to the aggregate functions
- Implement a non trivial example
{Notes:
Bonus points if you can point out the missing query optimizations
This post is based on a question in a Philadelphia SQL Server Users Group email distribution list. I have to admit that I did not understand all of the requirements of the question but hope that this post has enough content and useable code so that people can derive their own answers based on these examples.
These examples are just that examples. Multiple techniques were illustrated to show how something can be done. Feel free to point out errors in the code but keep in mind that some things here are not the optimal solution on purpose. Apply the techniques to your own work to meet your requirements.
}
Example #1
USE tempdb
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
-------------------------------------------------------------------------------
-- Create Sample Data
-------------------------------------------------------------------------------
CREATE TABLE #MyTable (
ID_MyTable INT IDENTITY(1,1) NOT NULL
, ID_Ref INT NOT NULL
, SomeData VARCHAR(50) NOT NULL
, CreateDate DATETIME NOT NULL
, TouchDate DATETIME NOT NULL
)
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Mercury', '1/12/2009', '2/12/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Venus', '2/13/2009', '4/20/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Earth', '3/14/2009', '7/22/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'The Moon', '4/15/2009', '4/23/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Mars', '5/16/2009', '9/12/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Jupiter', '6/17/2009', '11/17/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'IO', '7/18/2009', '8/4/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Europa', '8/19/2009', '9/12/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Ganymede', '9/20/2009', '11/30/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Callisto', '10/21/2009', '11/15/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Saturn', '11/22/2009', '12/04/2009')
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate) VALUES(22, 'Neptune', '12/23/2009', '12/23/2009')
---------------------------------------
-- Even more sample data
---------------------------------------
INSERT INTO #MyTable (ID_Ref, SomeData, CreateDate, TouchDate)
SELECT
MT.ID_Ref*2 AS ID_Ref
, MT.SomeData + ' ' + CAST(MT.ID_Ref * 2 AS VARCHAR(10)) AS SomeData
, DATEADD(dd,-17, MT.CreateDate) AS CreateDate
, DATEADD(dd, 17, MT.TouchDate) AS TouchDate
FROM
#MyTable MT
WHERE
MT.ID_MyTable % 2 = 0
UNION ALL
SELECT
MT.ID_Ref*3 AS ID_Ref
, MT.SomeData + ' ' + CAST(MT.ID_Ref * 3 AS VARCHAR(10)) AS SomeData
, DATEADD(dd, -9, MT.CreateDate) AS CreateDate
, DATEADD(dd, 22, MT.TouchDate) AS TouchDate
FROM
#MyTable MT
WHERE
MT.ID_MyTable % 2 = 1
Example #2
-------------------------------------------------------------------------------
-- Set up variables
-------------------------------------------------------------------------------
DECLARE @StartRange DATETIME
DECLARE @EndRange DATETIME
DECLARE @NumberOfDaysNotTouched INT
DECLARE @NumberOfDaysBetweenCreateAndTouch INT
SET @StartRange = '3/1/2009'
SET @EndRange = '10/31/2009'
SET @NumberOfDaysNotTouched = 45
SET @NumberOfDaysBetweenCreateAndTouch = 60
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-- Compound Aggregates Using CTE
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
---------------------------------------
-- Sample of DataToCareAbout
---------------------------------------
SELECT
MT.ID_MyTable
, MT.ID_Ref
, MT.SomeData
, MT.CreateDate
, MT.TouchDate
, DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched
, DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch
FROM
#MyTable MT
WHERE
MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate)
AND
(
DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate
OR
DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch
)
---------------------------------------
-- Compound Aggregate query
---------------------------------------
;
WITH DataToCareAbout AS (
SELECT
MT.ID_MyTable
, MT.ID_Ref
, MT.SomeData
, MT.CreateDate
, MT.TouchDate
-- Row based calculations to make it easier to consume downstream
, DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched
, DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch
FROM
#MyTable MT
WHERE
MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate)
AND
(
DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate
OR
DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch
)
)
,
AggregateData AS (
-- Basic Aggregate data
-- lots of cast to Float to allow more precision in math downstream
SELECT
DTC.ID_Ref
-- The next line is included for illustration & to make math easier downstream
, (SELECT CAST(COUNT(*) AS FLOAT) FROM DataToCareAbout) TotalRecCount
, CAST(COUNT(*) AS FLOAT) TotalRecsPerRef
, CAST(SUM(
CASE
WHEN DTC.NumberOfDaysNotTouched >= @NumberOfDaysNotTouched THEN 1
ELSE 0
END) AS FLOAT) AS NumberOfRecordsOverDaysNotTouchedThreshold
, CAST(SUM(
CASE
WHEN DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch THEN 1
ELSE 0
END) AS FLOAT)AS NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
FROM
DataToCareAbout DTC
GROUP BY
DTC.ID_Ref
)
,
AggregateData_AVG_DaysNotTouched AS (
-- Basic Aggregate data but with filtering that affects calculated values
SELECT
DTC.ID_Ref
, AVG(DTC.NumberOfDaysNotTouched) AS AVG_NumberOfDaysNotTouched_OverThreshold
FROM
DataToCareAbout DTC
WHERE
DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch
GROUP BY
DTC.ID_Ref
)
,
AggregateData_AVG_DaysBetweenCreateAndTouch AS (
-- Basic Aggregate data but with filtering that affects calculated values
SELECT
DTC.ID_Ref
, AVG(DTC.NumberOfDaysBetweenCreateAndTouch) AS AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold
FROM
DataToCareAbout DTC
WHERE
DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch
GROUP BY
DTC.ID_Ref
)
,
AggregateData_AllRecords AS (
SELECT
ROUND(AVG(AD.TotalRecsPerRef),2) AS AVG_RecsPerRef
, ROUND(AVG(AD.NumberOfRecordsOverDaysNotTouchedThreshold),2) AS AVG_RecordsOverDaysNotTouchedThreshold
, ROUND(AVG(AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold),2) AS AVG_RecordsOverDaysBetweenCreateAndTouchThreshold
FROM
AggregateData AD
)
SELECT
DTC.ID_MyTable
, DTC.ID_Ref
, DTC.SomeData
, DTC.CreateDate
, DTC.TouchDate
, DTC.NumberOfDaysNotTouched
, DTC.NumberOfDaysBetweenCreateAndTouch
, ROUND((AD.TotalRecsPerRef / AD.TotalRecCount),2) AS PercentageOf_RecsAllRefs
, ROUND((AD.NumberOfRecordsOverDaysNotTouchedThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef
, ROUND((AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef
, A_DNT.AVG_NumberOfDaysNotTouched_OverThreshold
, A_CT.AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold
, AD.TotalRecCount
, AD.TotalRecsPerRef
, AD.NumberOfRecordsOverDaysNotTouchedThreshold
, AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
, AD_All.AVG_RecsPerRef
, AD_All.AVG_RecordsOverDaysNotTouchedThreshold
, AD_All.AVG_RecordsOverDaysBetweenCreateAndTouchThreshold
FROM
DataToCareAbout DTC
INNER JOIN
AggregateData AD
ON DTC.ID_Ref = AD.ID_Ref
CROSS JOIN
AggregateData_AllRecords AD_All
LEFT OUTER JOIN -- LOJ Needed since filtering could have eliminated keys
AggregateData_AVG_DaysNotTouched A_DNT
ON DTC.ID_Ref = A_DNT.ID_Ref
LEFT OUTER JOIN -- LOJ Needed since filtering could have eliminated keys
AggregateData_AVG_DaysBetweenCreateAndTouch A_CT
ON DTC.ID_Ref = A_CT.ID_Ref
Example #3
-------------------------------------------------------------------------------
-- Set up variables
-------------------------------------------------------------------------------
DECLARE @StartRange DATETIME
DECLARE @EndRange DATETIME
DECLARE @NumberOfDaysNotTouched INT
DECLARE @NumberOfDaysBetweenCreateAndTouch INT
SET @StartRange = '3/1/2009'
SET @EndRange = '10/31/2009'
SET @NumberOfDaysNotTouched = 45
SET @NumberOfDaysBetweenCreateAndTouch = 60
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-- Compound Aggregates Using Temp Tables
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
IF OBJECT_ID('tempdb..#DataToCareAbout') IS NOT NULL
DROP TABLE #DataToCareAbout
IF OBJECT_ID('tempdb..#MyOutputRecords') IS NOT NULL
DROP TABLE #MyOutputRecords
IF OBJECT_ID('tempdb..#AggregateData') IS NOT NULL
DROP TABLE #AggregateData
CREATE TABLE #DataToCareAbout(
ID_MyTable INT NOT NULL
, ID_Ref INT NOT NULL
, SomeData VARCHAR(50) NOT NULL
, CreateDate DATETIME NOT NULL
, TouchDate DATETIME NOT NULL
, NumberOfDaysNotTouched INT NOT NULL
, NumberOfDaysBetweenCreateAndTouch INT NOT NULL
)
CREATE TABLE #AggregateData (
ID_Ref INT NOT NULL
, TotalRecCount FLOAT NOT NULL
, TotalRecsPerRef FLOAT NOT NULL
, NumberOfRecordsOverDaysNotTouchedThreshold FLOAT NOT NULL
, NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold FLOAT NOT NULL
)
CREATE TABLE #MyOutputRecords (
ID_MyTable INT NOT NULL
, ID_Ref INT NOT NULL
, SomeData VARCHAR(50) NOT NULL
, CreateDate DATETIME NOT NULL
, TouchDate DATETIME NOT NULL
, NumberOfDaysNotTouched INT NOT NULL
, NumberOfDaysBetweenCreateAndTouch INT NOT NULL
, PercentageOf_RecsAllRefs NUMERIC(8,2) NOT NULL
, PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef NUMERIC(8,2) NOT NULL
, PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef NUMERIC(8,2) NOT NULL
, AVG_NumberOfDaysNotTouched_OverThreshold NUMERIC(8,2) NULL
, AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold NUMERIC(8,2) NULL
, TotalRecCount INT NOT NULL
, TotalRecsPerRef INT NOT NULL
, NumberOfRecordsOverDaysNotTouchedThreshold INT NOT NULL
, NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold INT NOT NULL
, AVG_RecsPerRef NUMERIC(8,2) NULL
, AVG_RecordsOverDaysNotTouchedThreshold NUMERIC(8,2) NULL
, AVG_RecordsOverDaysBetweenCreateAndTouchThreshold NUMERIC(8,2) NULL
)
-------------------------------------------------------------------------------
-- Get data to care about
-------------------------------------------------------------------------------
INSERT INTO #DataToCareAbout(
ID_MyTable
, ID_Ref
, SomeData
, CreateDate
, TouchDate
, NumberOfDaysNotTouched
, NumberOfDaysBetweenCreateAndTouch
)
SELECT
MT.ID_MyTable
, MT.ID_Ref
, MT.SomeData
, MT.CreateDate
, MT.TouchDate
-- Row based calculations to make it easier to consume downstream
, DATEDIFF(dd,MT.TouchDate,GETDATE()) AS NumberOfDaysNotTouched
, DATEDIFF(dd,MT.CreateDate,MT.TouchDate) AS NumberOfDaysBetweenCreateAndTouch
FROM
#MyTable MT
WHERE
MT.CreateDate BETWEEN ISNULL(@StartRange, MT.CreateDate) AND ISNULL(@EndRange, MT.CreateDate)
AND
(
DATEADD(dd,-1*@NumberOfDaysNotTouched,GETDATE())>= MT.TouchDate
OR
DATEDIFF(dd,MT.CreateDate,MT.TouchDate)>= @NumberOfDaysBetweenCreateAndTouch
)
ORDER BY
MT.ID_Ref
, MT.ID_MyTable
-------------------------------------------------------------------------------
-- Calculate Aggregate Data
-------------------------------------------------------------------------------
INSERT INTO #AggregateData (
ID_Ref
, TotalRecCount
, TotalRecsPerRef
, NumberOfRecordsOverDaysNotTouchedThreshold
, NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
)
SELECT
DTC.ID_Ref
-- The next line is included for illustration & to make math easier downstream
, (SELECT CAST(COUNT(*) AS FLOAT) FROM #DataToCareAbout) TotalRecCount
, CAST(COUNT(*) AS FLOAT) TotalRecsPerRef
, CAST(SUM(
CASE
WHEN DTC.NumberOfDaysNotTouched >= @NumberOfDaysNotTouched THEN 1
ELSE 0
END) AS FLOAT) AS NumberOfRecordsOverDaysNotTouchedThreshold
, CAST(SUM(
CASE
WHEN DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch THEN 1
ELSE 0
END) AS FLOAT)AS NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
FROM
#DataToCareAbout DTC
GROUP BY
DTC.ID_Ref
ORDER BY
DTC.ID_Ref
-------------------------------------------------------------------------------
-- build output table even though not all the data is available yet
-------------------------------------------------------------------------------
INSERT INTO #MyOutputRecords (
ID_MyTable
, ID_Ref
, SomeData
, CreateDate
, TouchDate
, NumberOfDaysNotTouched
, NumberOfDaysBetweenCreateAndTouch
, TotalRecCount
, TotalRecsPerRef
, NumberOfRecordsOverDaysNotTouchedThreshold
, NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
, PercentageOf_RecsAllRefs
, PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef
, PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef
)
SELECT
dtca.ID_MyTable
, dtca.ID_Ref
, dtca.SomeData
, dtca.CreateDate
, dtca.TouchDate
, dtca.NumberOfDaysNotTouched
, dtca.NumberOfDaysBetweenCreateAndTouch
, AD.TotalRecCount
, AD.TotalRecsPerRef
, AD.NumberOfRecordsOverDaysNotTouchedThreshold
, AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold
, ROUND((AD.TotalRecsPerRef / AD.TotalRecCount),2) AS PercentageOf_RecsAllRefs
, ROUND((AD.NumberOfRecordsOverDaysNotTouchedThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_RecordsOverDaysNotTouchedThreshold_PerRef
, ROUND((AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold / AD.TotalRecsPerRef),2) AS PercentageOf_NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold_PerRef
FROM
#DataToCareAbout AS dtca
INNER JOIN
#AggregateData AS AD
ON dtca.ID_Ref = AD.ID_Ref
ORDER BY
dtca.ID_Ref
, dtca.ID_MyTable
-------------------------------------------------------------------------------
-- Update Output with missing values
-------------------------------------------------------------------------------
---------------------------------------
-- AVG_NumberOfDaysNotTouched_OverThreshold
---------------------------------------
;
WITH AVG_Data AS (
-- Basic Aggregate data but with filtering that affects calculated values
SELECT
DTC.ID_Ref
, AVG(DTC.NumberOfDaysNotTouched) AS AVG_NumberOfDaysNotTouched_OverThreshold
FROM
#DataToCareAbout DTC
WHERE
DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch
GROUP BY
DTC.ID_Ref
)
UPDATE #MyOutputRecords
SET
AVG_NumberOfDaysNotTouched_OverThreshold = AVG_Data.AVG_NumberOfDaysNotTouched_OverThreshold
FROM
#MyOutputRecords MOR
INNER JOIN
AVG_Data
ON MOR.ID_Ref = AVG_Data.ID_Ref
---------------------------------------
-- AVG_NumberOfDaysNotTouched_OverThreshold
---------------------------------------
;
WITH AVG_Data AS (
-- Basic Aggregate data but with filtering that affects calculated values
SELECT
DTC.ID_Ref
, AVG(DTC.NumberOfDaysBetweenCreateAndTouch) AS AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold
FROM
#DataToCareAbout DTC
WHERE
DTC.NumberOfDaysBetweenCreateAndTouch >= @NumberOfDaysBetweenCreateAndTouch
GROUP BY
DTC.ID_Ref
)
UPDATE #MyOutputRecords
SET
AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold = AVG_Data.AVG_NumberOfDaysBetweenCreateAndTouch_OverThreshold
FROM
#MyOutputRecords MOR
INNER JOIN
AVG_Data
ON MOR.ID_Ref = AVG_Data.ID_Ref
---------------------------------------
-- All Records Aggregate Data
---------------------------------------
;
WITH AggregateData_AllRecords AS (
SELECT
ROUND(AVG(AD.TotalRecsPerRef),2) AS AVG_RecsPerRef
, ROUND(AVG(AD.NumberOfRecordsOverDaysNotTouchedThreshold),2) AS AVG_RecordsOverDaysNotTouchedThreshold
, ROUND(AVG(AD.NumberOfRecordsOverDaysBetweenCreateAndTouchThreshold),2) AS AVG_RecordsOverDaysBetweenCreateAndTouchThreshold
FROM
#AggregateData AD
)
UPDATE #MyOutputRecords
SET
AVG_RecsPerRef = AR.AVG_RecsPerRef
, AVG_RecordsOverDaysNotTouchedThreshold = AR.AVG_RecordsOverDaysNotTouchedThreshold
, AVG_RecordsOverDaysBetweenCreateAndTouchThreshold = AR.AVG_RecordsOverDaysBetweenCreateAndTouchThreshold
FROM
#MyOutputRecords MOR
CROSS JOIN
AggregateData_AllRecords AR
-------------------------------------------------------------------------------
-- Output to SSRS
-------------------------------------------------------------------------------
SELECT
*
FROM
#MyOutputRecords