How are numbers actually represented at the lowest levels inside a computer? Let's break down integer and floating point encoding.
Previous post:
Integer encoding
The representation of a number in a computer is collectively called the number of machines. Data processing and computation in a computer are both binary.
Generally, the number of machines is represented in 8 bits. Practical Data has positive and negative numbers, because the computer can only represent 0 and 1 states, and the positive or negative values of the data are "+", in a computer, it is distinguished by a binary value of 0 or 1, which is usually placed in the highest bit and becomes the symbol bit.
After the number of symbols is converted into numeric values, the computer has designed a variety of symbol bit encoding methods to conveniently perform arithmetic operations on the number of machines and improve the computing speed, the most common methods for expressing the number of machines are original code, reverse code, complement code, and transfer code.
- The original code representation is a simple representation of the number of machines. Its sign bit uses 0 to represent the positive and 1 to represent the negative numbers. The value is generally expressed in binary format.
[+ 0] original = 0000000, [-0] original = 1000000
The original code of negative numbers cannot be directly used in operations
- The reverse code of the number of machines can be obtained from the original code. If the number of machines is positive, the inverse code of the number of machines is the same as the original code. If it is negative, the inverse code of the number of machines is the original code of the machine (except the symbol bit) that you get.
[+ 0] = 0000000, [-0] = 11111111
- The Complement Representation can be obtained by the original code. If the number of machines is positive, the complement code of the number of machines is the same as the original code; if it is negative, then, the complement code of the number of machines is obtained based on the reverse code of the original code (except the symbol bit) and the value of 1 in the undigit.
Like the original code, the complement code also has the problem of positive and negative zero ambiguity, so the computer further introduced 2's complement. Let’s first observe the conversion process of the original code, inverse code, and complement code of negative zero:
[+ 0] fill = 0000000, [- 0] fill = 0000000
[-0 ] = 1000 0000 (original) = 1111 1111 (reversed) = 1 0000 0000 (repaired)
Add 1 to the complement of negative zero. A carry will occur, but the length of a byte is only 8 bits, so it overflows to the 9th bit. 1 will be discarded. That is, the complement of the negative zero is 0000 0000
, the same as the positive zero's complement. This means that there is only one zero in the two's complement representation and the positive and negative zero ambiguity is thus resolved.
byte
the value range of the type is [-128, 127]
, an extra negative number. How did you get it? We note that the interval [-127, 127]
. All integers have corresponding original code, complemented code, and reversed code, and the original code and complemented code can be converted to each other.
However, the complement 1000 0000
is an exception, it does not have a corresponding original code.
According to the conversion method, we get the original code of the complement code as 0000 0000
. This is a contradiction because the original code represents the number 0
, its complement should be itself. The computer specifies this special complement 1000 0000
represents -128
. Actually, -127 + -1
The calculation result under the two's complement code is -128
.
(-127) + (-1)
-> 1111 1111 (original) + 1000 0001 (original)
= 1000 0000 (reversed) + 1111 1110 (reversed)
= 1000 0001 (repair) + 1111 1111 (repair)
= 1000 0000
-> -128
The hardware circuits inside the computer are mainly designed based on addition operations. This is because compared to other operations (such as multiplication, division, and subtraction), addition operations are simpler to implement in hardware, easier to parallelize, and faster.
Note that this does not mean that computers can only do addition. By combining addition with some basic logical operations, computers can perform a variety of other mathematical operations. For example, calculating subtraction a - b
Can be converted to computational addition a + (-b)
; Calculating multiplications and divisions can be converted into calculating multiple additions or subtractions.
Floating point encoding
If you are careful, you may find that: the length of int
and float
is the same as 4 bytes, but why is float
the value range of is much larger int
?
This is because floating point numbers are represented differently. Remember a 32-bit length binary number as:
The 32-bit length float
consists of the following three parts:
- Sign bit
S
: Occupies 1 bit, corresponding to - Exponent bit
E
: Occupies 8 bits, corresponding to - Fractional digit
N
: Occupies 23 bits, corresponding to
The calculation formula converted to decimal is:
The value range of each item is:
The representation of float
contains an exponent bit, causing its range of values to be much larger than int
.
Although floating point numbers extend the range of values, a side effect is that precision is sacrificed. The integer type uses all 32 bits to represent numbers, and the numbers are evenly distributed; due to the existence of the exponent bit, the larger the value of the floating point number, the greater the difference between two adjacent numbers will tend to be.
Double precision also uses a representation method similar to.
Conclusion
By explaining the representation forms rather than just rote specifications, we gain deeper insight into optimizing numeric computation at a fundamental level. The low-level choices reflect beauty in efficiently solving real-world engineering challenges.
Top comments (0)