Wednesday
Jun092010

Data Truncation Handling Techniques

Question:

we are trying to get this to error because the value is getting truncated. What do i need to change on our warning settings?
SELECT CONVERT(varchar(10), 'adededddasdfasdf')

 

On a regular basis I use CONVERT(VARCHAR(10), GETDATE(), 101) to convert

2010-06-09 08:03:07.313    to    2010-06-09

I'm counting on the truncation to cut out the data I don't want. I don't recall ever getting a truncation error in this case.  So there may not be a setting to cause this to occur.  Unfortunately, I do not know the direct answer to the question.

On the other hand truncation errors are common when doing ETL projects.  Here is an example of what does cause a truncation error to bubble up.

CREATE TABLE #MyTable ( MyField VARCHAR(10) NOT NULL ) 

INSERT INTO #MyTable ( MyField )VALUES ( 'abcdefghij') 
INSERT INTO #MyTable ( MyField )VALUES ( '12345678901112') 

SELECT * FROM #MyTable AS mt 

 

Executing this code will have the following in the message window:

(1 row(s) affected)
Msg 8152, Level 16, State 14, Line 10
String or binary data would be truncated.
The statement has been terminated.

(1 row(s) affected)

This code will cause a truncation error to abort the second insert statement.   The error is useful and important because it helps maintain data integrity.  On the other hand it would be nice to find a way to avoid the problem in the first place

Here are some techniques for preemptively dealing with truncation errors:

When dealing with unstable or unverified data sources a good approach is to validate the data prior to loading. Typically this involves loading the data in to varchar(max) fields and running various metadata inspections such as LEN(), ISNUMERIC(), ISDATE(), etc.

Yes it takes time, but in an automated process it is more important to run without error and be able to generate meaningful error reports on what data is flawed.

---------------------------------------
-- In line validation
---------------------------------------
;
WITH SourceData AS (    
    SELECT 'adededddasdfasdf' AS MyField
    UNION ALL SELECT 'abcdefghij'
    UNION ALL SELECT '12345678901112'
    UNION ALL SELECT '1234567890'
)
SELECT
      CONVERT(VARCHAR(10), MyField) AS MyField_Len10
    , CASE    
        WHEN LEN(MyField) <= 10 THEN 0
        ELSE 1
      END AS IsTruncated
FROM 
    SourceData


---------------------------------------
-- List of truncation errors - with option raised error if it's really needed
---------------------------------------
DECLARE @ErrorCount    INT
SET @ErrorCount = 0
;
WITH SourceData AS (    
    SELECT 'adededddasdfasdf' AS MyField
    UNION ALL SELECT 'abcdefghij'
    UNION ALL SELECT '12345678901112'
    UNION ALL SELECT '1234567890'
)
SELECT
      CONVERT(VARCHAR(10), MyField) AS MyField_Len10
FROM 
    SourceData
WHERE
    LEN(MyField) > 10    

SET @ErrorCount = @@ROWCOUNT
IF @ErrorCount != 0
    BEGIN
        RAISERROR('There were %d truncation errors', 16,1,@ErrorCount)
    END

A more useful approach to check for the error case.

GO
---------------------------------------
-- Alt solution without returning data
---------------------------------------
DECLARE @ErrorCount    INT
SET @ErrorCount = 0
;
WITH SourceData AS (    
    SELECT 'adededddasdfasdf' AS MyField
    UNION ALL SELECT 'abcdefghij'
    UNION ALL SELECT '12345678901112'
    UNION ALL SELECT '1234567890'
)
SELECT
     @ErrorCount = COUNT(*) 
FROM 
    SourceData
WHERE
    LEN(MyField) > 10    

--SET @ErrorCount = @@ROWCOUNT
IF @ErrorCount != 0
    BEGIN
        RAISERROR('There were %d truncation errors', 16,1,@ErrorCount)
    END

 

The RAISERROR code is there to illustrate how to cause an error to bubble up if that is the desired behavior.

Note: The IF EXISTS() syntax would also have worked well in this case.  However, If EXISTS()  doesn’t work with CTEs and the error count would not have been available for the error message.

Thursday
Jun032010

Presenting at PSSUG June 9, 2010 Meeting

PSSUGlogo_sm

I will be the keynote speaker at the June 9, 2010 monthly meeting of the Philadelphia SQL Server User Group at the Malvern, PA  office of Microsoft.  The presentation will be “Part II” of the “Fine Art of Stored Procedures and User Defined Functions.”

Part I was given recently at the Philly.Net Code Camp 2010.1 back in April.  That presentation went fine but I was disappointed that I didn’t have enough time to get to the really meaty code portion. 

Part II leaves off where Part I ended with only a touch of repetition.  If you caught the Code Camp Presentation this will be overwhelmingly new content.  If you missed it don’t worry, the key ideas from Part I will be covered but without the detail.  Instead the presentation will look at real issues and factors that are important to crafting beautiful sprocs and UDFs and show the technique with code.

PSSUG Combo Logo

Monday
May172010

The Dangers of Failing to Understand T-SQL’s DISTINCT and GROUP BY

Example of the Difference Between DISTINCT and GROUP BY

The distinct keyword in a T-SQL query to ensure an unique result set is often a quick and easy solution, which is often fraught with peril and performance issues.  The key difference between GROUP BY and DISTINCT is when the aggregation or elimination of duplicates occurs during the query processing step.  The distinct operation is completed after the non distinct result set has materialized.  Thus it can not take advantage of indexes or early reduction of unnecessary data such as the GROUP BY clause can achieve.

The code snippet below is a simple example of when and how the aggregation or elimination of duplicate data occurs.  The data isn’t all that important except to realize that it does contain some duplicates in the data of interest.

DECLARE @MyData TABLE(
    SomeValue    VARCHAR(3) NOT NULL
)

INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'BBB')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'CCC')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'BBB')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'BBB')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'CCC')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'BBB')
INSERT INTO @MyData ( SomeValue ) VALUES ( 'AAA')


SELECT DISTINCT
      ROW_NUMBER() OVER(ORDER BY md.SomeValue) AS rn
    , SomeValue
FROM 
    @MyData AS md    


SELECT 
      ROW_NUMBER() OVER(ORDER BY md.SomeValue) AS rn
    , SomeValue
FROM 
    @MyData AS md    
GROUP BY
      SomeValue    

 

Result Set # 1  Result Set # 2 
rn SomeValue 1 AAA
1 AAA 2 BBB
2 AAA 3 CCC
3 AAA
4 AAA
5 AAA
6 AAA
7 BBB
8 BBB
9 BBB
10 BBB
11 CCC
12 CCC

 

Looking at the results it should be evident that the first query using the DISTINCT directive didn't apply the de-duplication phase until after the Row Numbers were uniquely generated.  Thus there were no records to eliminate due to duplication.

The second result set has the results that may have been expected in that only the unique non-duplicate set of data was returned with a simple row number.

The take-away from this is not that the GROUP BY clause is correct, rather that there are differences between the DISTINCT directive and the GROUP BY clause.  They are not always interchangeable from a data perspective.  The performance issue is a whole other discussion.

Tuesday
May042010

Debunking SQL Server Myths

Paul Randal of SQLSkills has completed a nice series on debunking misconceptions of SQL server.

http://www.sqlskills.com/BLOGS/PAUL/category/Misconceptions.aspx

Some of the things I knew, some I was completely wrong about and others I just didn't know about.  The bulk of the misconceptions I had stemmed from working with SQL Serve since the mid 90’s going back to version 6.5.  You do your best to learn and keep up with the new releases but can get caught in the trap of not knowing some detail you once knew has changed.

Focusing on just the new features is not enough.  We must also take the time to learn the details and some internals of things that are changed.  They can have real implications of who database should be designed and optimized.

Monday
Apr122010

Philly.Net Code Camp presentation – The Fine Art of Stored Procedures and User Defined Functions

The presentation of “The Fine Art of Stored Procedures and User Defined Functions” at Philly.Net's Code Camp 2010.1 on Saturday April 10, 2010 during the 10AM Session went reasonably well.  As this was the fist time giving this presentation there is much cleanup to do in focusing  the talking points and the demos.  There was more material not covered then we actually had time to go through.  So it seems that this could easily be a two part series.

Kevin Goff’s presentation had an Idea in it that I have thought about before but never took the time to implement.  His presentation had way more material then there was time for.  So he built a menu into his Power Point Presentation and allowed the attendees to pick the ones they wanted to cover.  I like this because it allows the presenter to develop all the material and have a flexible presentation based on the audience.  The “Fine Art of SQL” series is a good candidate for this Style.

I will submit the second half or more of this presentation to Code Camp or PSSUG for upcoming events.  The next one I will aim towards an intermediate level instead of the beginner level.  This way we can spend more time on code and technique then theory and book knowledge.

Don’t forget to complete the online evaluations.

Additionally, I will attempt to post various sections of this presentation as a continuing series on this blog.  I a bout of shameless self-promotion please subscribe to the RSS feed.

Here are some links that may be of use:

The presentation with code here

MS SQL Books Online:

Going Deeper On Microsoft  TechNet :