How to split a string in SQL Server the right way
I’ve seen many string splitting functions in SQL Server and many of them overwrite the string that is being split over and over which causes terrible performance by re-allocating memory every time. This is a terrible way to split a string in SQL Server.
Today I crafted a much more efficient SQL Server string splitting function. If you need varchar
data, you can change the nvarchar
to varchar
with no problem.
I compared this against a table value parameter and it actually outperformed the table value parameter! The reason I believe, is because I have a very large string (about 10,000 items once it’s split) and that’s a LOT of transact SQL for SQL Server to parse.
SET NOCOUNT ON
-- You will want to change nvarchar(MAX) to nvarchar(50), varchar(50) or whatever matches exactly with the string column you will be searching against
DECLARE @SplitStringTable TABLE (Value nvarchar(MAX) NOT NULL)
DECLARE @StringToSplit nvarchar(MAX) = 'your|string|to|split|here'
DECLARE @SplitEndPos int
DECLARE @SplitValue nvarchar(MAX)
DECLARE @SplitDelim nvarchar(1) = '|'
DECLARE @SplitStartPos int = 1
SET @SplitEndPos = CHARINDEX(@SplitDelim, @StringToSplit, @SplitStartPos)
WHILE @SplitEndPos > 0
BEGIN
SET @SplitValue = SUBSTRING(@StringToSplit, @SplitStartPos, (@SplitEndPos - @SplitStartPos))
INSERT @SplitStringTable (Value) VALUES (@SplitValue)
SET @SplitStartPos = @SplitEndPos + 1
SET @SplitEndPos = CHARINDEX(@SplitDelim, @StringToSplit, @SplitStartPos)
END
SET @SplitValue = SUBSTRING(@StringToSplit, @SplitStartPos, 2147483647)
INSERT @SplitStringTable (Value) VALUES(@SplitValue)
SET NOCOUNT OFF
-- You can select or join with the values in @SplitStringTable at this point.
[…] I cover a much better way to do a string split here: http://www.digitalruby.com/split-string-sql-server/ […]
Can you show the table-valued parameter, the code you used that did the same thing, and how exactly you measured the performance? This iterative approach has certainly not been anywhere near the top performers in my testing… so much so that I even left it out of my round-up two years ago.
It wasn’t scentific. I had a string with 10,000 guids in it, and had two query windows open in SQL management studio. Timing with the clock app on my iPhone, the string split way took about 6 seconds, the table value way took about 10. The table value code ended up being a bunch of INSERT INTO @p5 VALUES (‘GUID-GOES-HERE’).
Table-valued parameter or table variable? I’m still not clear I understand exactly what you were comparing to what.
For example, if you have 10,000 individual insert statements, then of course that will take longer (in fact I bet 99.9% of the time was spent processing all of those individual insert statements). The point of the TVP is that you have this set of 10,000 things already as a set somewhere (e.g. a DataTable in C#). Then you just pass that *set* in, and no looping, or 10,000 inserts, or string parsing are required.
It was a table value parameter. The generated SQL for it showed thousands of insert statements into a parameter. Is there a way to pass a DataTable directly as a parameter and not have it use all those insert statements?
Of course there is. How did you build the string of 10,000 GUIDs?
String parameter to sqlcommand.
So in C# you did something to build that list of 10,000 strings, right? Put them in a DataTable instead, and pass it as a Structured parameter type to a table-valued parameter. This should not produce “generated SQL” of 10,000 individual inserts.
Sorry I mis-informed you. For the table value parameter I used an array of SqlDataRecord I believe.
Well that’s where your looping of 10,000 inserts come from, not from the use of a table variable.
Example here:
http://sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql
I also tried the Moden way from your website, and after 10 seconds it had only got through about 2000 of my GUID’s, so I just killed the timer.
Yes, Jeff has admitted that his approach is not designed to handle strings longer than 8K well at all (and my testing proved that as well). And you would need that because you can’t fit 10,000 GUIDs in 8,000 characters.