На
главную страницу
String functions in MS SQL SERVER 2000
S. Moiseenko
Below is a full list of the string functions taken from BOL:
ASCII |
NCHAR
|
SOUNDEX
|
CHAR |
PATINDEX
|
SPACE
|
CHARINDEX |
REPLACE
|
STR
|
DIFFERENCE |
QUOTENAME
|
STUFF
|
LEFT |
REPLICATE
|
SUBSTRING
|
LEN |
REVERSE
|
UNICODE
|
LOWER |
RIGHT
|
UPPER
|
RTRIM |
RTRIM
|
|
Let's start from two reciprocally related functions - ASCII and
CHAR:
The ASCII function returns an ASCII code value of the leftmost character of a
character expression, being a function argument.
Here is an example of the way to determine how many different letters which the
names of the ships in the Ships table start from are:
SELECT COUNT(DISTINCT ASCII(name)) FROM Ships
Result is 11. To find out what these letters are, we can use CHAR function that
returns a character by a known ASCII code value (from 0 to 255):
SELECT DISTINCT CHAR(ASCII(name)) FROM Ships ORDER BY 1
We should note that the identical result can be got simpler using one more function
- LEFT that has the following syntax:
LEFT (<character_expression >, <integer_expression>
)
and returns the part of a character string starting at a specified number of
characters from the left. So,
SELECT DISTINCT LEFT(name, 1) FROM Ships ORDER BY 1
And this is the way to get the table of codes of all alphabetical characters:
SELECT CHAR(ASCII('a')+ num-1) letter, ASCII('a')+ num - 1 [code]
FROM
(
SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM
(SELECT 1 a UNION ALL SELECT 2 UNION
ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS
JOIN
(SELECT 1 b UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS
JOIN
(SELECT 1 c UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
) x
WHERE ASCII('a')+ num -1 BETWEEN ASCII('a') AND ASCII('z')
For those who are unaware of generating of numerical sequence I refer to the
corresponding article.
It is known that code values of lowercase letters and capital letters differ. That's
why in order to get the whole set without rewriting above query; it will be
enough to write up the identical code to aforesaid:
UNION
SELECT CHAR(ASCII('A')+ num-1) letter, ASCII('A')+ num - 1 [code]
FROM
(
SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM
(SELECT 1 a UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS
JOIN
(SELECT 1 b UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS
JOIN
(SELECT 1 c UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
) x
WHERE ASCII('A')+ num -1 BETWEEN ASCII('A') AND ASCII('Z')
Now let's consider the task of searching a substring in a string expression. The two
functions can be used for this: CHARINDEX and PATINDEX.
They both return a starting position (a position of the first character of a
substring) of a substring in a character string. The function CHARINDEX has the
following syntax:
CHARINDEX ( string_expression , target_expression [ ,
start_location ] )
Here an optional integer parameter start_location defines a position in a string_expression,
where searching a target_expression starts from. If this argument is
omitted, a searching starts from the beginning of a string_expression.
For example, the query
SELECT name FROM Ships WHERE CHARINDEX('sh', name) > 0
will output the ships that has the sequence of characters "sh" in its names. Here we
use the fact that if a substring to be found is not found, the function
CHARINDEX returns 0. The result of executing the query will contain the
following ships:
name |
Kirishima |
Musashi |
Washington |
We should note that if a target_expression or string_expression is
NULL, the result of the function is also NULL.
The next example determines the positions of the first and second occurrences of the
character "a" in the name of the ship "California"
SELECT CHARINDEX('a',name) first_a,
CHARINDEX('a', name, CHARINDEX('a', name)+1) second_a
FROM Ships WHERE name='California'
Pay attention that in determining of the second occurrence the starting position,
used in function, is the position of a character next to the first "a" -
CHARINDEX('a', name)+1. The propriety of the result - 2 and 10 - can be easily
checked.
The function PATINDEX has the following syntax:
PATINDEX ( '%pattern%' , string_expression )
The main distinction of this function from CHARINDEX is that a search string may
contain wildcard characters - % and _. Leading and ending characters "%" are
required. For, example, usage of this function in the first example will look
like
SELECT name FROM Ships WHERE PATINDEX('%sh%', name) > 0
And there's the way to find the names of the ships that contain the sequence of
three characters, first and last of which is "e":
SELECT name FROM Ships
WHERE PATINDEX('%e_e%', name) >0
The result of executing of this query looks like this:
name |
Revenge |
Royal Sovereign |
The function RIGHT that complements LEFT returns
a specified number of characters from the right of a character expression:
RIGHT(<character_expression>,<integer_expression>)
Here is, for example, the way to determine the names of the ships that start and end
with the same letter:
SELECT name FROM Ships
WHERE LEFT(name, 1) = RIGHT(name, 1)
The thing that we got an empty resulting set means that such ships are absent in our
database. Let's take a combination - a class and a name of a ship.
The combining of two string values into one is called concatenation, and in
the SQL Server sign "+" is used for this operation ("||" in standard).
So,
SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1)
Here we separate by space the class and the name of a ship. Besides, in order not to
repeat the whole construction in the function argument, we use a subquery. The
result will look like this:
cn |
Iowa Missouri
|
North Carolina Washington |
And what if a string expression will contain only one character? The query will
output it. You can easily check it by writing
SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
UNION ALL SELECT 'a' as nc
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1)
In order to exclude this case, one more useful function LEN(string_expression)
can be used. It returns the number of characters in a string. Let's bound by
the case, when the number of characters is greater than one:
SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
UNION ALL SELECT 'a' as nc
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1) AND LEN(cn)>1
There are tasks on the website that require to normalize (find maximum etc.) in a
numerical order a values represented in a string format, for example, a seat
number in a plane ("2d") or a speed of CD ("24x"). The problem lies in a fact
that the text sorts like this (ascending)
11a
1a
2a
Indeed,
SELECT '1a' AS place
UNION ALL SELECT '2a'
UNION ALL SELECT '11a'
ORDER BY 1
If the sorting of places in order of rows increase is required, the order must be
like this:
1a
2a
11a
In order to get such an order, the sorting by a numeric values in text is needed. I
offer the following algorithm:
1. Extract a number from the string.
2. Convert it to a numeric format.
3. Perform a sorting by a converted value.
As we know that there's only one letter in the string, and to extract a number we
can use the following construction that doesn't depend on a number of numerals
in a seat number:
LEFT(place, LEN(place)-1)
If we just bound by this, we'll get
Converting to a numeric format may look like the following:
CAST (LEFT(place, LEN(place)-1) AS INT)
Now we should just perform a sorting
SELECT * FROM (
SELECT '1a' AS place
UNION ALL SELECT '2a'
UNION ALL SELECT '11a'
) x ORDER BY CAST(LEFT(place, LEN(place)-1) AS INT)
As required to proof.
Previously we used the LEFT function to extract a number from a string expression,
because we knew a priori how many characters should be removed from the right
(one). And what if we should extract a string from a substring not by the known
position of a character, but by a character itself? For example, extract all
the characters before the first letter "x" (a value of CD speed).
In this case we can also use the previously considered CHARINDEX function that
allows us to determine the unknown position of the character:
SELECT model, LEFT(cd, CHARINDEX('x', cd) -1) FROM PC
SUBSTRING function
SUBSTRING ( <expression> ,<start>,<length>
)
This function allows us to extract from an expression its part of a specified
length, starting from a specified initial position. Expression may be a
character or a binary string, and also can have a text or image type. For
example, if we need to get 3 characters in a ship name, starting from the 2nd
character, then it's not quite easy to do it without the SUBSTRING function.
And so we write:
SELECT name, SUBSTRING(name, 2, 3) FROM Ships
In case we need to extract all the characters, starting from the certain, we also
can use this function. For example,
SELECT name, SUBSTRING(name, 2, LEN(name)) FROM Ships
will give us all the characters in the names of the ships, starting from the second
letter in the name. Pay attention that in order to specify the number of
characters to extract I used the LEN(name) function that returns a number of
characters in the name. It's clearly that forasmuch as I need the characters,
starting from the second, its number will be less than the whole number of the
characters in the name. But it doesn't cause an error, because if a specified
number of characters exceed a permissible number, all the characters until the
end of a string will be extracted. That is why I take it with a reserve sparing
myself the calculations.
REVERSE function
This function reverses a string, just like reading it from the right to left. That
is the result of the query
SELECT REVERSE('abcdef')
will be 'fedcba'. In case of the absence of the RIGHT function in the language the
query
SELECT RIGHT('abcdef',3)
can be equally replaced with the query
SELECT REVERSE(LEFT(REVERSE('abcdef'),3))
I see the profit of this function in the following. Let we need to determine a
position not the first, but the last occurrence of any character (or a sequence
of characters) in a character string.
Let's remind an example, where we were determining the position of the first
character "a" in the name of the ship "California":
SELECT CHARINDEX('a', name) first_a
FROM Ships WHERE name='California'
Now let's determine the position of the last occurrence of the character "a" in this
name. The function
CHARINDEX('a', REVERSE(name))
will allow us to find this position, but from the right. In order to get the
position of this character, but from the left, it's enough to write
SELECT LEN(name) + 1 - CHARINDEX('a', REVERSE(name)) first_a
FROM Ships WHERE name='California'
REPLACE function
REPLACE ( <string_expression1> , <string_expression2>
, <string_expression3> )
This function replaces in the string_expression1 all the occurrences of the string_expression2
with the string_expression3. This function is absolutely helpful in UPDATE
operators, if we should change the content of the column. For example, let we
are needed to replace all the spaces with the hyphen in the names of the ships.
Then we can write
UPDATE Ships
SET name = REPLACE(name, ' ', '-')
(This example you can perform on the page with the
DML exercises, where the questions on updating the data are admitted)
However this function can also find its appliance in more non-trivial cases. Let's
determine how many times the letter "a" is used in a name of a ship. The idea
is quite easy: let's replace every searching letter with any two characters,
and after that we'll calculate the difference of lengths of the string we got
and the source string. So,
SELECT name, LEN(REPLACE(name, 'a', 'aa')) - LEN(name) FROM Ships
And what if we need to determine the number of occurrences of a random sequence of
characters that, say, we get as a parameter of a stored procedure? In this case
we should complement the used above algorithm with the division on a number of
a characters in a sequence:
DECLARE @str AS VARCHAR(100)
SET @str='ma'
SELECT name, (LEN(REPLACE(name, @str, @str+@str)) - LEN(name))/LEN(@str) FROM
Ships
Here in order to double the number of searching characters the concatenation
@str+@str was applied. However for this effect we can use one more function -
REPLICATE that repeats the first argument so many times as it
specified by the second argument.
SELECT name, (LEN(REPLACE(name, @str, REPLICATE(@str, 2))) -
LEN(name))/LEN(@str) FROM Ships
That is we repeat twice the substring stored in the variable @str.
If we need to replace in a string not a certain sequence of characters, but a
specified number of characters, starting from some position, it's simpler to
use the STUFF function:
STUFF ( <character_expression1> , <start>
, <length> , <character_expression2> )
This function replaces a substring with length of length that starts from the start
position in the character_expression1 with the character_expression2.
The example. Change the name of a ship: append "_" (underscore) and a launching year
after the first 5 characters. If there is less than 5 characters in the name,
complement it with the spaces.
This task can be solved with many different functions. We'll try to do it with the
STUFF function.
In the first approach we'll write (bounding with the select query):
SELECT name, STUFF(name, 6, LEN(name), '_'+launched) FROM Ships
I use LEN(name) as the third argument (number of characters to replace), because I should replace all
the characters until the end of the string, and that's why I take it with a reserve - the original number
of characters in the name. But this query will still return an error. And the question is not in the third
argument, but in the forth, where the concatenation of the string constant and the integer column is performing.
This is the type conversion error. In order to convert a number to its string representation we may use
one more built-in function - STR:
STR ( <float_expression> [ , <length>
[ , <decimal> ] ] )
Under this conversion the round-up is being worked, and a length specifies a length of the resulting string.
For example,
STR(3.3456, 5, 1) 3.3
STR(3.3456, 5, 2) 3.35
STR(3.3456, 5, 3) 3.346
STR(3.3456, 5, 4) 3.346
Take note, that if have got string view of the number is less than the specified length, the leading spaces are
added. If the result is greater than the specified length, the fractional part is being truncated
(with the round-up); in case there is an integer value we get the corresponding number of the asterisks "*".
STR(12345,4,0) ****
By the way, default length is 10 symbols. Keeping in mind that a year is represented by four numbers, we will write
SELECT name, STUFF(name, 6, LEN(name), '_'+STR(launched, 4)) FROM
Ships
Everything is almost right. The case is rested to consider, when the number of symbols in the name is less than 6,
as in this case the function STUFF gives NULL. Well, we will have to be on the rack to the end, which is
connected with using this function in this example, on the way applying one more string function.
We'll add the trailing blanks for the name`s length being knowingly more than 6. There is a special SPACE function for that
SPACE(<number_of_blanks>):
SELECT name, STUFF(name + SPACE(6), 6, LEN(name),
'_'+STR(launched,4)) FROM Ships
LTRIM and RTRIM functions
LTRIM (<character_expression>)
RTRIM (<character_expression>)
cut accordingly the leading and trailing blanks of the string expression, which is being converted by default
to the VARCHAR type.
Let us need to build such a string: passenger`s name_passenger`s identifier for every record from the Passenger table. If we write
SELECT name + '_' + CAST(id_psg AS VARCHAR) FROM Passenger,
we will get in the result something like:
A
_1
It is connected with the fact that the column name has the type CHAR(30).
The short string is added with the blanks for this type to the specified dimension (we have 30 symbols).
The function RTRIM will help us here:
SELECT RTRIM(name) + '_' + CAST(id_psg AS VARCHAR) FROM Passenger
LOWER and UPPER functions
LOWER(<character_expression>)
UPPER(<character_expression>)
convert all the characters of the argument accordingly to the lower and upper register.
These functions turn out to be useful in comparison with the registerdepended strings.
A pair of the interesting functions SOUNDEX and DIFFERENCE
SOUNDEX(<character_expression>)
DIFFERENCE (<character_expression_1>, <character_expression_2>)
allow us to determine the similar-sounding of words. SOUNDEX returns a four-character code to evaluate
the similarity, but DIFFERENCE just properly evaluates the similarity of two comparing character strings.
Being that these functions do not support Cyrillic, I refer interested to BOL for the
examples of its usage.
In conclusion we will adduce some functions and a few examples of the Unicode usage:
UNICODE function
UNICODE (<ncharacter_expression>)
returns the Unicode value for the first character of the input expression.
NCHAR function
NCHAR (<integer>)
returns a character with the given integer Unicode value. A few examples:
SELECT ASCII('а'), UNICODE('а')
Returns the ASCII-code value and the Unicode value of the Russian letter "а": 224 and 1072.
SELECT CHAR(ASCII('а')), CHAR(UNICODE('а'))
We try to recover a symbol by its code value. We get "а" and NULL. The NULL-value returns because the 1072
code value is absent in the usual code table.
SELECT CHAR(ASCII('а')), NCHAR(UNICODE('а'))
Now it's all right, "a" in both cases.
Наконец,
SELECT NCHAR(ASCII('а'))
will give "a", because the Unicode value 224 exactly corresponds to this letter.
» Given examples here can be done directly on the website by selecting the check
box “Without checking” on the page with
SELECT exercises.
На главную страницу