DEV Community

Cover image for Numeric encoding
Đặng Đình Sáng
Đặng Đình Sáng

Posted on

Numeric encoding

How are numbers actually represented at the lowest levels inside a computer? Let's break down integer and floating point encoding.

Previous post:

Integer encoding

The representation of a number in a computer is collectively called the number of machines. Data processing and computation in a computer are both binary.
Generally, the number of machines is represented in 8 bits. Practical Data has positive and negative numbers, because the computer can only represent 0 and 1 states, and the positive or negative values of the data are "+", in a computer, it is distinguished by a binary value of 0 or 1, which is usually placed in the highest bit and becomes the symbol bit.
After the number of symbols is converted into numeric values, the computer has designed a variety of symbol bit encoding methods to conveniently perform arithmetic operations on the number of machines and improve the computing speed, the most common methods for expressing the number of machines are original code, reverse code, complement code, and transfer code.

  • The original code representation is a simple representation of the number of machines. Its sign bit uses 0 to represent the positive and 1 to represent the negative numbers. The value is generally expressed in binary format.
[+ 0] original = 0000000, [-0] original = 1000000
Enter fullscreen mode Exit fullscreen mode

+0=[+ 000 0000]=0101 0110+5=[ 000 0101]=1100 10100=[ 000 0000]=1000 000028=[ 001 1100]=1001 1100 +0 = [+ \ 000 \ 0000] = 0101 \ 0110 \newline +5 = [- \ 000 \ 0101] = 1100 \ 1010 \newline -0 = [- \ 000 \ 0000] = 1000 \ 0000 \newline -28 = [- \ 001 \ 1100] = 1001 \ 1100 \newline

The original code of negative numbers cannot be directly used in operations

1+(2)0000 0001+1000 0010=1000 00113 \begin{align*} 1 + (-&2) \end{align*} \newline \to 0000 \ 0001 + 1000 \ 0010 \newline = 1000 \ 0011 \newline \to -3
  • The reverse code of the number of machines can be obtained from the original code. If the number of machines is positive, the inverse code of the number of machines is the same as the original code. If it is negative, the inverse code of the number of machines is the original code of the machine (except the symbol bit) that you get.
[+ 0] = 0000000, [-0] = 11111111
Enter fullscreen mode Exit fullscreen mode
X1=+ 1010110X2=1 1001010[X1] Reverse =[X1] original = 01010110[X2] original=11001010[X2] reversed=10110101 X1 = + \ 1010110 \newline X2 = 1 \ 1001010 \newline [X1] \ Reverse \ = [X1] \ original \ = \ 01010110 \newline [X2] \ original = 11001010 \newline [X2] \ reversed = 10110101 \newline
  • The Complement Representation can be obtained by the original code. If the number of machines is positive, the complement code of the number of machines is the same as the original code; if it is negative, then, the complement code of the number of machines is obtained based on the reverse code of the original code (except the symbol bit) and the value of 1 in the undigit.
1+(2)0000 0001(original)+1000 0010(original)=0000 0001(reversed)+1111 1101(reversed)=1111 1110(reversed)=1000 0001(original)1 \begin{align*} 1 + (-&2) \end{align*} \newline \to 0000 \ 0001 (original) + 1000 \ 0010 (original) \newline = 0000 \ 0001 (reversed) + 1111 \ 1101 (reversed) \newline = 1111 \ 1110 (reversed) \newline = 1000 \ 0001 (original) \newline \to -1

Like the original code, the complement code also has the problem of positive and negative zero ambiguity, so the computer further introduced 2's complement. Let’s first observe the conversion process of the original code, inverse code, and complement code of negative zero:

[+ 0] fill = 0000000, [- 0] fill = 0000000
[-0 ] = 1000 0000 (original) = 1111 1111 (reversed) = 1 0000 0000 (repaired) 
Enter fullscreen mode Exit fullscreen mode

Add 1 to the complement of negative zero. A carry will occur, but the length of a byte is only 8 bits, so it overflows to the 9th bit. 1 will be discarded. That is, the complement of the negative zero is 0000 0000, the same as the positive zero's complement. This means that there is only one zero in the two's complement representation and the positive and negative zero ambiguity is thus resolved.

byte the value range of the type is [-128, 127], an extra negative number. How did you get it? We note that the interval [-127, 127]. All integers have corresponding original code, complemented code, and reversed code, and the original code and complemented code can be converted to each other.
However, the complement 1000 0000 is an exception, it does not have a corresponding original code.

According to the conversion method, we get the original code of the complement code as 0000 0000. This is a contradiction because the original code represents the number 0, its complement should be itself. The computer specifies this special complement 1000 0000 represents -128. Actually, -127 + -1 The calculation result under the two's complement code is -128.

(-127) + (-1)
-> 1111 1111 (original) + 1000 0001 (original)
= 1000 0000 (reversed) + 1111 1110 (reversed)
= 1000 0001 (repair) + 1111 1111 (repair)
= 1000 0000
-> -128
Enter fullscreen mode Exit fullscreen mode

The hardware circuits inside the computer are mainly designed based on addition operations. This is because compared to other operations (such as multiplication, division, and subtraction), addition operations are simpler to implement in hardware, easier to parallelize, and faster.

Note that this does not mean that computers can only do addition. By combining addition with some basic logical operations, computers can perform a variety of other mathematical operations. For example, calculating subtraction a - b Can be converted to computational addition a + (-b); Calculating multiplications and divisions can be converted into calculating multiple additions or subtractions.

Floating point encoding

If you are careful, you may find that: the length of int and float is the same as 4 bytes, but why is float the value range of is much larger int?

This is because floating point numbers are represented differently. Remember a 32-bit length binary number as:

b31b30b29...b2b1b0 b_{31}b_{30}b_{29}...b_2b_1b_0

The 32-bit length float consists of the following three parts:

  • Sign bit S: Occupies 1 bit, corresponding to
    b31b_{31}
  • Exponent bit E: Occupies 8 bits, corresponding to
    b30b29...b23b_{30}b_{29}...b_{23}
  • Fractional digit N: Occupies 23 bits, corresponding to
    b22b21...b0b_{22}b_{21}...b_0

The calculation formula converted to decimal is:

val=(1)S×2E127×(1+N) val = (-1)^S \times 2^{E-127} \times (1 + N)

The value range of each item is:

S{0,1},E{1,2,...,254}(1+N)=(1+i=123b23i2i)[1,2223] S \in \lbrace0,1\rbrace, E \in \lbrace1,2,...,254\rbrace \newline (1+N) = (1 + \displaystyle\sum_{i=1}^{23}b_{23 - i}2^{-i}) \subset \lbrack1,2 - 2^{-23} \rbrack

The representation of float contains an exponent bit, causing its range of values ​​to be much larger than int.

Although floating point numbers extend the range of values, a side effect is that precision is sacrificed. The integer type uses all 32 bits to represent numbers, and the numbers are evenly distributed; due to the existence of the exponent bit, the larger the value of the floating point number, the greater the difference between two adjacent numbers will tend to be.

Double precision also uses a representation method similar to.

Conclusion

By explaining the representation forms rather than just rote specifications, we gain deeper insight into optimizing numeric computation at a fundamental level. The low-level choices reflect beauty in efficiently solving real-world engineering challenges.

Top comments (0)