How to split a string in SQL Server the right way

I’ve seen many string splitting functions in SQL Server and many of them overwrite the string that is being split over and over which causes terrible performance by re-allocating memory every time. This is a terrible way to split a string in SQL Server.

Today I crafted a much more efficient SQL Server string splitting function. If you need varchar data, you can change the nvarchar to varchar with no problem.

I compared this against a table value parameter and it actually outperformed the table value parameter! The reason I believe, is because I have a very large string (about 10,000 items once it’s split) and that’s a LOT of transact SQL for SQL Server to parse.

SET NOCOUNT ON

-- You will want to change nvarchar(MAX) to nvarchar(50), varchar(50) or whatever matches exactly with the string column you will be searching against
DECLARE @SplitStringTable TABLE (Value nvarchar(MAX) NOT NULL)
DECLARE @StringToSplit nvarchar(MAX) = 'your|string|to|split|here'
DECLARE @SplitEndPos int
DECLARE @SplitValue nvarchar(MAX)
DECLARE @SplitDelim nvarchar(1) = '|'
DECLARE @SplitStartPos int = 1

SET @SplitEndPos = CHARINDEX(@SplitDelim, @StringToSplit, @SplitStartPos)

WHILE @SplitEndPos > 0
BEGIN
	SET @SplitValue = SUBSTRING(@StringToSplit, @SplitStartPos, (@SplitEndPos - @SplitStartPos))
	INSERT @SplitStringTable (Value) VALUES (@SplitValue)
	SET @SplitStartPos = @SplitEndPos + 1
	SET @SplitEndPos = CHARINDEX(@SplitDelim, @StringToSplit, @SplitStartPos)
END

SET @SplitValue = SUBSTRING(@StringToSplit, @SplitStartPos, 2147483647)
INSERT @SplitStringTable (Value) VALUES(@SplitValue)

SET NOCOUNT OFF

-- You can select or join with the values in @SplitStringTable at this point.

 

0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

13 Comments
Inline Feedbacks
View all comments
trackback

[…] I cover a much better way to do a string split here: http://www.digitalruby.com/split-string-sql-server/ […]

Aaron Bertrand
6 years ago

Can you show the table-valued parameter, the code you used that did the same thing, and how exactly you measured the performance? This iterative approach has certainly not been anywhere near the top performers in my testing… so much so that I even left it out of my round-up two years ago.

Aaron Bertrand
6 years ago
Reply to  jjxtra

Table-valued parameter or table variable? I’m still not clear I understand exactly what you were comparing to what.

Aaron Bertrand
6 years ago
Reply to  jjxtra

For example, if you have 10,000 individual insert statements, then of course that will take longer (in fact I bet 99.9% of the time was spent processing all of those individual insert statements). The point of the TVP is that you have this set of 10,000 things already as a set somewhere (e.g. a DataTable in C#). Then you just pass that *set* in, and no looping, or 10,000 inserts, or string parsing are required.

Aaron Bertrand
6 years ago
Reply to  jjxtra

Of course there is. How did you build the string of 10,000 GUIDs?

Aaron Bertrand
6 years ago
Reply to  jjxtra

So in C# you did something to build that list of 10,000 strings, right? Put them in a DataTable instead, and pass it as a Structured parameter type to a table-valued parameter. This should not produce “generated SQL” of 10,000 individual inserts.

Aaron Bertrand
6 years ago
Reply to  jjxtra

Well that’s where your looping of 10,000 inserts come from, not from the use of a table variable.

Aaron Bertrand
6 years ago
Reply to  jjxtra

Yes, Jeff has admitted that his approach is not designed to handle strings longer than 8K well at all (and my testing proved that as well). And you would need that because you can’t fit 10,000 GUIDs in 8,000 characters.