Main page
Apology of NULL
S. Moiseenko
The meaning of the value NULL – is an absence of information or inapplicability
of the current attribute in the current tuple.
You can ask:”For what to have an attribute if its value is inapplicable?” The
answer to this question lies in a field of modeling of the enterprise. Let's
consider, for example, the database schema “Computers”. It represents the
relational model of the association “type-supertype”. The models of computer
production are the entries of the enterprise here; under this every type of
production (PC, laptop or printer) represents an individual table with
relations “many-to-one” with the Product table.
Such a model provides a high degree of normalization (3NF). However this is not
the only way. We would have stored all information in one table, which could
contain both mutual attributes for all models (for example, price) and
attributes, which have sense only for models of certain types (for example,
color – for printer defining). The value NULL is fully justified for this
schema just in the meaning of an inapplicability of characteristic, that is
NULL in the column color tells that this characteristic doesn't refer,
let us say, to PC models.
Let's get back to the second aspect of null values – absence of information. If
we decide to refuse the using of null values, we will have to offer an
alternative. The natural way is using of the default value that will be
substituted in the appropriate column under the absence of information. We
should note that such default values should be at least so many as different
data types that are supported by DBMS (integer, strings, datetime ,… ).
Let's consider, for example, the Laptop table and the price column. Let the
enterprise to be so that at the moment of information input about the models of
laptops their price is sometimes unknown. In choosing the default value we
should be bounded by permissible values for the price column. Data type for the
column (money) makes us to be bounded by numerical values, compatible with
current type and domain constraints (constraint of the CHECK type), imposed on
permissible values for this column. Any positive value as a default value will
cause a confusion, because it is impossible to differ “true” value of price
from the substitution of the absent price. That is why we should choose zero or
any negative value. And now let's speak about disadvantages of such a
substitute.
For example, let's consider the information about the models 1298 in the Laptop
table. Let's carry out the query to learn the data:
SELECT * FROM laptop WHERE model=1298
Here are the results of carrying out of this query:
code
|
model
|
speed
|
ram
|
hd
|
price
|
screen
|
1
|
1298
|
350
|
32
|
4
|
700.0
|
11
|
4
|
1298
|
600
|
64
|
10
|
1050.0
|
15
|
6
|
1298
|
450
|
64
|
10
|
950.0
|
12
|
Let's consider the task of getting the average price of the model 1298. While
all prices are known the solution of this task doesn't cause any doubts:
SELECT model, AVG( price) avg_price
FROM laptop
WHERE model=1298
GROUP BY model
model
|
avg_price
|
1298
|
900.0
|
Let now the price of the model with code 4 is unknown. If, as it was decided
earlier, we substitute an unknown value, say, with zero (UPDATE laptop SET
price=0 WHERE code=4), then we will get knowingly the wrong average price value
– 550.0
If we use the value NULL, the result will be quite right – 825.0, because null
values will be ignored under grouping, in the result of which the average value
will be calculated only by the models with the known price (that is the average
value by two models).
So, as I tried to show, the value NULL is the inherent peculiarity of the
relational model, and instead of critics I offer to learn correctly working
with such values.
Given examples here can be done directly on the website by selecting the check
box “Without checking” on the page with
SELECT exercises.
Main page