(First published on What the # do I know)
In a previous post, entitled How to pre-populate a random strings pool, I’ve written about an inline table-valued user defined function for generating random strings in SQL Server.
Now, following a stackoverflow question entitled SQL SERVER generate data using Regex pattern, I want to present an improvement to this function.
This new and improved function gives it’s user a better control over the random strings it generates – it has minimum and maximum length parameters to enable generating variable-length random strings, and it has another parameter called @CharType
, which enables the user to control whether the function will generate lower-case chars, upper-case chars, digits, or any combination of the above.
It also returns one more column – which is basically a row-number, starting with 1. This enables the creation of multiple randomly generated columns using simple joins.
As before, it still uses a view to generate a guid, basing all randomness on that guid. So here’s the code:
CREATE VIEW dbo.GuidGenerator
AS
SELECT Newid() As NewGuid;
And the function:
CREATE FUNCTION dbo.RandomStringGenerator
(
-- the minimum length
@MinLength int,
-- the maximum length
@MaxLength int,
-- the maximum number of rows to return. Note: up to 1,000,000 rows
@Count int,
-- 1, 2 and 4 stands for lower-case, upper-case and digits.
-- a bitwise combination of these values can be used to generate all possible combinations:
-- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
@CharType tinyint
)
RETURNS TABLE
AS
RETURN
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000
SELECT TOP(@Count)
n As Number,
(
SELECT TOP (Length)
-- choose what char combination to use for the random part
CASE @CharType
WHEN 1 THEN Lower
WHEN 2 THEN Upper
WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
WHEN 4 THEN Digit
WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
WHEN 7 THEN
CASE Rnd % 3
WHEN 0 THEN Lower
WHEN 1 THEN Upper
ELSE Digit
END
END
FROM Tally As t0
-- create a random number from a guid using the GuidGenerator view
CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
CROSS APPLY
(
-- generate a random lower-case char, upper-case char and digit
SELECT CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
CHAR(48 + Rnd % 10) As Digit -- Random digit
) As Chars
WHERE t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
FOR XML PATH('')
) As RandomString
FROM Tally As t1
CROSS APPLY
(
-- Select a random length between @MinLength and @MaxLength (inclusive)
SELECT TOP 1 n As Length
FROM Tally As t2
CROSS JOIN GuidGenerator
WHERE t2.n >= @MinLength
AND t2.n <= @MaxLength
AND t2.n <> t1.n
ORDER BY NewGuid
) As Lengths;
One usage of this function is to create tables with random values, as demonstrated here:
DECLARE @Count int = 10;
SELECT CAST(IntVal.RandomString As Int) As IntColumn,
UpVal.RandomString as UpperCaseValue,
LowVal.RandomString as LowerCaseValue,
MixVal.RandomString as MixedValue
FROM dbo.RandomStringGenerator(3, 7, @Count, 4) As IntVal
JOIN dbo.RandomStringGenerator(10, 10, @Count, 1) As LowVal
ON IntVal.Number = LowVal.Number
JOIN dbo.RandomStringGenerator(5, 10, @Count, 2) As UpVal
ON IntVal.Number = UpVal.Number
JOIN dbo.RandomStringGenerator(10, 20, @Count, 7) As MixVal
ON IntVal.Number = MixVal.Number
Which will result in a 4 columns table, each with the specified type of random value:
IntColumn UpperCaseValue LowerCaseValue MixedValue
674 CCNVSDI esjyyesesv O2FAC7bfwg2Be5a91Q0
30732 UJKSL jktisddbnq 7o8B91Sg1qrIZSvG3AcL
4669472 HDLJNBWPJ qgtfkjdyku xUoLAZ4pAnpn
26347 DNAKERR vlehbnampb NBv08yJdKb75ybhaFqED
6084965 LJPMZMEU ccigzyfwnf MPxQ2t8jjmv0IT45yVcR
6619851 FEHKGHTUW wswuefehsp 40n7Ttg7H5YtVPF
781 LRWKVDUV bywoxqizju UxIp2O4Jb82Ts
52237 XXNPBL beqxrgstdo Uf9j7tCB4W2
876150 ZDRABW fvvinypvqa uo8zfRx07s6d0EP
And since this is still an inline table-valued function, performance are pretty darn good. Tests I’ve made shown creation of a 5 columns table with 1,000 rows average time of just half a second.
Top comments (0)