Books and articles about SQL   Сменить язык на: Русский 20 February 2020 02:09:41   Print version

На главную страницу

## String functions in MS SQL SERVER 2000

S. Moiseenko

Below is a full list of the string functions taken from BOL:

 ASCII NCHAR SOUNDEX CHAR PATINDEX SPACE CHARINDEX REPLACE STR DIFFERENCE QUOTENAME STUFF LEFT REPLICATE SUBSTRING LEN REVERSE UNICODE LOWER RIGHT UPPER RTRIM RTRIM

Let's start from two reciprocally related functions - ASCII and CHAR:

The ASCII function returns an ASCII code value of the leftmost character of a character expression, being a function argument.

Here is an example of the way to determine how many different letters which the names of the ships in the Ships table start from are:

SELECT COUNT(DISTINCT ASCII(name)) FROM Ships

Result is 11. To find out what these letters are, we can use CHAR function that returns a character by a known ASCII code value (from 0 to 255):

SELECT DISTINCT CHAR(ASCII(name)) FROM Ships ORDER BY 1

We should note that the identical result can be got simpler using one more function - LEFT that has the following syntax:

LEFT (<character_expression >, <integer_expression> )

and returns the part of a character string starting at a specified number of characters from the left. So,

SELECT DISTINCT LEFT(name, 1) FROM Ships ORDER BY 1

And this is the way to get the table of codes of all alphabetical characters:

SELECT CHAR(ASCII('a')+ num-1) letter, ASCII('a')+ num - 1 [code] FROM
(
SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM
(SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS JOIN
(SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS JOIN
(SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
) x
WHERE ASCII('a')+ num -1 BETWEEN ASCII('a') AND ASCII('z')

For those who are unaware of generating of numerical sequence I refer to the corresponding article.

It is known that code values of lowercase letters and capital letters differ. That's why in order to get the whole set without rewriting above query; it will be enough to write up the identical code to aforesaid:

UNION
SELECT CHAR(ASCII('A')+ num-1) letter, ASCII('A')+ num - 1 [code]
FROM
(
SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM
(SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS JOIN
(SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS JOIN
(SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
) x
WHERE ASCII('A')+ num -1 BETWEEN ASCII('A') AND ASCII('Z')

Now let's consider the task of searching a substring in a string expression. The two functions can be used for this: CHARINDEX and PATINDEX. They both return a starting position (a position of the first character of a substring) of a substring in a character string. The function CHARINDEX has the following syntax:

CHARINDEX ( string_expression , target_expression [ , start_location ] )

Here an optional integer parameter start_location defines a position in a string_expression, where searching a target_expression starts from. If this argument is omitted, a searching starts from the beginning of a string_expression. For example, the query

SELECT name FROM Ships WHERE CHARINDEX('sh', name) > 0

will output the ships that has the sequence of characters "sh" in its names. Here we use the fact that if a substring to be found is not found, the function CHARINDEX returns 0. The result of executing the query will contain the following ships:

 name Kirishima Musashi Washington

We should note that if a target_expression or string_expression is NULL, the result of the function is also NULL.

The next example determines the positions of the first and second occurrences of the character "a" in the name of the ship "California"

SELECT CHARINDEX('a',name) first_a,
CHARINDEX('a', name, CHARINDEX('a', name)+1) second_a
FROM Ships WHERE name='California'

Pay attention that in determining of the second occurrence the starting position, used in function, is the position of a character next to the first "a" - CHARINDEX('a', name)+1. The propriety of the result - 2 and 10 - can be easily checked.

The function PATINDEX has the following syntax:

PATINDEX ( '%pattern%' , string_expression )

The main distinction of this function from CHARINDEX is that a search string may contain wildcard characters - % and _. Leading and ending characters "%" are required. For, example, usage of this function in the first example will look like

SELECT name FROM Ships WHERE PATINDEX('%sh%', name) > 0

And there's the way to find the names of the ships that contain the sequence of three characters, first and last of which is "e":

SELECT name FROM Ships
WHERE PATINDEX('%e_e%', name) >0

The result of executing of this query looks like this:

 name Revenge Royal Sovereign

The function RIGHT that complements LEFT returns a specified number of characters from the right of a character expression:

RIGHT(<character_expression>,<integer_expression>)

Here is, for example, the way to determine the names of the ships that start and end with the same letter:

SELECT name FROM Ships
WHERE LEFT(name, 1) = RIGHT(name, 1)

The thing that we got an empty resulting set means that such ships are absent in our database. Let's take a combination - a class and a name of a ship.

The combining of two string values into one is called concatenation, and in the SQL Server sign "+" is used for this operation ("||" in standard). So,

SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1)

Here we separate by space the class and the name of a ship. Besides, in order not to repeat the whole construction in the function argument, we use a subquery. The result will look like this:

 cn Iowa Missouri North Carolina Washington

And what if a string expression will contain only one character? The query will output it. You can easily check it by writing

SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
UNION ALL SELECT 'a' as nc
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1)

In order to exclude this case, one more useful function LEN(string_expression) can be used. It returns the number of characters in a string. Let's bound by the case, when the number of characters is greater than one:

SELECT * FROM (
SELECT class +' '+ name AS cn FROM Ships
UNION ALL SELECT 'a' as nc
) x
WHERE LEFT(cn, 1)=RIGHT(cn, 1) AND LEN(cn)>1

There are tasks on the website that require to normalize (find maximum etc.) in a numerical order a values represented in a string format, for example, a seat number in a plane ("2d") or a speed of CD ("24x"). The problem lies in a fact that the text sorts like this (ascending)

11a
1a
2a

Indeed,

SELECT '1a' AS place
UNION ALL SELECT '2a'
UNION ALL SELECT '11a'
ORDER BY 1

If the sorting of places in order of rows increase is required, the order must be like this:

1a
2a
11a

In order to get such an order, the sorting by a numeric values in text is needed. I offer the following algorithm:
1. Extract a number from the string.
2. Convert it to a numeric format.
3. Perform a sorting by a converted value.

As we know that there's only one letter in the string, and to extract a number we can use the following construction that doesn't depend on a number of numerals in a seat number:

LEFT(place, LEN(place)-1)

If we just bound by this, we'll get

 place 1a 11a 2a

Converting to a numeric format may look like the following:

CAST (LEFT(place, LEN(place)-1) AS INT)

Now we should just perform a sorting

SELECT * FROM (
SELECT '1a' AS place
UNION ALL SELECT '2a'
UNION ALL SELECT '11a'
) x ORDER BY CAST(LEFT(place, LEN(place)-1) AS INT)

As required to proof.

Previously we used the LEFT function to extract a number from a string expression, because we knew a priori how many characters should be removed from the right (one). And what if we should extract a string from a substring not by the known position of a character, but by a character itself? For example, extract all the characters before the first letter "x" (a value of CD speed).

In this case we can also use the previously considered CHARINDEX function that allows us to determine the unknown position of the character:

SELECT model, LEFT(cd, CHARINDEX('x', cd) -1) FROM PC

SUBSTRING function

SUBSTRING ( <expression> ,<start>,<length> )

This function allows us to extract from an expression its part of a specified length, starting from a specified initial position. Expression may be a character or a binary string, and also can have a text or image type. For example, if we need to get 3 characters in a ship name, starting from the 2nd character, then it's not quite easy to do it without the SUBSTRING function. And so we write:

SELECT name, SUBSTRING(name, 2, 3) FROM Ships

In case we need to extract all the characters, starting from the certain, we also can use this function. For example,

SELECT name, SUBSTRING(name, 2, LEN(name)) FROM Ships

will give us all the characters in the names of the ships, starting from the second letter in the name. Pay attention that in order to specify the number of characters to extract I used the LEN(name) function that returns a number of characters in the name. It's clearly that forasmuch as I need the characters, starting from the second, its number will be less than the whole number of the characters in the name. But it doesn't cause an error, because if a specified number of characters exceed a permissible number, all the characters until the end of a string will be extracted. That is why I take it with a reserve sparing myself the calculations.

REVERSE function

This function reverses a string, just like reading it from the right to left. That is the result of the query

SELECT REVERSE('abcdef')

will be 'fedcba'. In case of the absence of the RIGHT function in the language the query

SELECT RIGHT('abcdef',3)

can be equally replaced with the query

SELECT REVERSE(LEFT(REVERSE('abcdef'),3))

I see the profit of this function in the following. Let we need to determine a position not the first, but the last occurrence of any character (or a sequence of characters) in a character string.

Let's remind an example, where we were determining the position of the first character "a" in the name of the ship "California":

SELECT CHARINDEX('a', name) first_a
FROM Ships WHERE name='California'

Now let's determine the position of the last occurrence of the character "a" in this name. The function

CHARINDEX('a', REVERSE(name))

will allow us to find this position, but from the right. In order to get the position of this character, but from the left, it's enough to write

SELECT LEN(name) + 1 - CHARINDEX('a', REVERSE(name)) first_a
FROM Ships WHERE name='California'

REPLACE function

REPLACE ( <string_expression1> , <string_expression2> , <string_expression3> )

This function replaces in the string_expression1 all the occurrences of the string_expression2 with the string_expression3. This function is absolutely helpful in UPDATE operators, if we should change the content of the column. For example, let we are needed to replace all the spaces with the hyphen in the names of the ships. Then we can write

UPDATE Ships
SET name = REPLACE(name, ' ', '-')

(This example you can perform on the page with the DML exercises, where the questions on updating the data are admitted)

However this function can also find its appliance in more non-trivial cases. Let's determine how many times the letter "a" is used in a name of a ship. The idea is quite easy: let's replace every searching letter with any two characters, and after that we'll calculate the difference of lengths of the string we got and the source string. So,

SELECT name, LEN(REPLACE(name, 'a', 'aa')) - LEN(name) FROM Ships

And what if we need to determine the number of occurrences of a random sequence of characters that, say, we get as a parameter of a stored procedure? In this case we should complement the used above algorithm with the division on a number of a characters in a sequence:

DECLARE @str AS VARCHAR(100)
SET @str='ma'
SELECT name, (LEN(REPLACE(name, @str, @str+@str)) - LEN(name))/LEN(@str) FROM Ships

Here in order to double the number of searching characters the concatenation @str+@str was applied. However for this effect we can use one more function - REPLICATE that repeats the first argument so many times as it specified by the second argument.

SELECT name, (LEN(REPLACE(name, @str, REPLICATE(@str, 2))) - LEN(name))/LEN(@str) FROM Ships

That is we repeat twice the substring stored in the variable @str.

If we need to replace in a string not a certain sequence of characters, but a specified number of characters, starting from some position, it's simpler to use the STUFF function:

STUFF ( <character_expression1> , <start> , <length> , <character_expression2> )

This function replaces a substring with length of length that starts from the start position in the character_expression1 with the character_expression2.

The example. Change the name of a ship: append "_" (underscore) and a launching year after the first 5 characters. If there is less than 5 characters in the name, complement it with the spaces.

This task can be solved with many different functions. We'll try to do it with the STUFF function.

In the first approach we'll write (bounding with the select query):

SELECT name, STUFF(name, 6, LEN(name), '_'+launched) FROM Ships

I use LEN(name) as the third argument (number of characters to replace), because I should replace all the characters until the end of the string, and that's why I take it with a reserve - the original number of characters in the name. But this query will still return an error. And the question is not in the third argument, but in the forth, where the concatenation of the string constant and the integer column is performing. This is the type conversion error. In order to convert a number to its string representation we may use one more built-in function - STR:

STR ( <float_expression> [ , <length> [ , <decimal> ] ] )

Under this conversion the round-up is being worked, and a length specifies a length of the resulting string. For example,

STR(3.3456, 5, 1)         3.3
STR(3.3456, 5, 2)       3.35
STR(3.3456, 5, 3)     3.346
STR(3.3456, 5, 4)     3.346

Take note, that if have got string view of the number is less than the specified length, the leading spaces are added. If the result is greater than the specified length, the fractional part is being truncated (with the round-up); in case there is an integer value we get the corresponding number of the asterisks "*".

STR(12345,4,0)     ****

By the way, default length is 10 symbols. Keeping in mind that a year is represented by four numbers, we will write

SELECT name, STUFF(name, 6, LEN(name), '_'+STR(launched, 4)) FROM Ships

Everything is almost right. The case is rested to consider, when the number of symbols in the name is less than 6, as in this case the function STUFF gives NULL. Well, we will have to be on the rack to the end, which is connected with using this function in this example, on the way applying one more string function.

We'll add the trailing blanks for the name`s length being knowingly more than 6. There is a special SPACE function for that

SPACE(<number_of_blanks>):

SELECT name, STUFF(name + SPACE(6), 6, LEN(name), '_'+STR(launched,4)) FROM Ships

LTRIM and RTRIM functions

LTRIM (<character_expression>)

RTRIM (<character_expression>)

cut accordingly the leading and trailing blanks of the string expression, which is being converted by default to the VARCHAR type.

Let us need to build such a string: passenger`s name_passenger`s identifier for every record from the Passenger table. If we write

SELECT name + '_' + CAST(id_psg AS VARCHAR) FROM Passenger,

we will get in the result something like:

A               _1

It is connected with the fact that the column name has the type CHAR(30). The short string is added with the blanks for this type to the specified dimension (we have 30 symbols). The function RTRIM will help us here:

SELECT RTRIM(name) + '_' + CAST(id_psg AS VARCHAR) FROM Passenger

LOWER and UPPER functions

LOWER(<character_expression>)

UPPER(<character_expression>)

convert all the characters of the argument accordingly to the lower and upper register. These functions turn out to be useful in comparison with the registerdepended strings.

A pair of the interesting functions SOUNDEX and DIFFERENCE

SOUNDEX(<character_expression>)

DIFFERENCE (<character_expression_1>, <character_expression_2>)

allow us to determine the similar-sounding of words. SOUNDEX returns a four-character code to evaluate the similarity, but DIFFERENCE just properly evaluates the similarity of two comparing character strings. Being that these functions do not support Cyrillic, I refer interested to BOL for the examples of its usage.

In conclusion we will adduce some functions and a few examples of the Unicode usage:

UNICODE function

UNICODE (<ncharacter_expression>)

returns the Unicode value for the first character of the input expression.

NCHAR function

NCHAR (<integer>)

returns a character with the given integer Unicode value. A few examples:

SELECT ASCII('а'), UNICODE('а')

Returns the ASCII-code value and the Unicode value of the Russian letter "а": 224 and 1072.

SELECT CHAR(ASCII('а')), CHAR(UNICODE('а'))

We try to recover a symbol by its code value. We get "а" and NULL. The NULL-value returns because the 1072 code value is absent in the usual code table.

SELECT CHAR(ASCII('а')), NCHAR(UNICODE('а'))

Now it's all right, "a" in both cases.

Наконец,

SELECT NCHAR(ASCII('а'))

will give "a", because the Unicode value 224 exactly corresponds to this letter.

» Given examples here can be done directly on the website by selecting the check box “Without checking” on the page with SELECT exercises.

На главную страницу Print version


 Usage of any materials of this site is possibleonly under condition of mandatory allocation of the direct link to a sitehttp://www.sqlbooks.ruon each page where used materials are placed. Main Articles Books     Copyright c 2002-2006. All rights reserved. contact