Why do we need binary encoding because computers can only process numbers, and if we want computers to process our text, we must first convert the text to numbers before we can process it. Just as our brain can only deal with human language, we cannot understand the language of animals.So we need specific binary encoding to implement the conversion of human and computer languages. In the process of computer processing information, it is divided into two processes, programming and decoding, all using the same Binary Form. The process of decoding is to convert the digital information in the computer into text or image through a specific Binary Form.
There are also many types of binary code due to various countries and different languages, so many different encoding methods are also produced. First, let’s introduce the ASCII code. Since the computer was invented by the Americans, only 127 letters were first encoded into the computer, that is, uppercase and lowercase letters, numbers, and some symbols. This code table is called ASCII code. For example, the capital letter A code is 65. Inside the computer, all information is ultimately represented as a binary string. Each bit has two states, 0 and 1, so eight bins can be combined into 256 states, which is called a byte. That is, a byte can be used to represent 256 different states, each of which corresponds to a symbol, which is 256 symbols, from 0000000 to 11111111. In the 1960s, the United States developed a set of character codes that made uniform rules for the relationship between English characters and binary bits. This is called ASCII code and has been used ever since.
To handle Chinese, obviously one byte is not enough, at least two bytes are needed, and it can’t conflict with ASCII encoding. Therefore, China has developed GB2312 encoding to encode Chinese. There are hundreds of languages in the world. Japan has compiled Japanese into Shift_JIS. South Korea has compiled Korean into Euc-kr. Countries have national standards, and conflicts will inevitably occur. As a result, in a multi-language mixed text. , there will be garbled characters. In order to support different international coding standards, the Unicode Transformation Format was created. It is a code that incorporates all the symbols in the world. Each symbol is given a unique encoding, and the garbled problem disappears. This is Unicode, as its name suggests, which is a code for all symbols.The encoding length of Unicode is variable, there are 8 characters, 16 characters, and 32 characters, but the more characters, the more complicated the corresponding byte order mark. Therefore, eight bytes are the most popular byte and ASCII, and UTF-8 became the main character encoding.
If you want to know more about ASCII and Unicode, you can click on this link: https://www.youtube.com/watch?v=61Bs7-ycL64