Difference between revisions of "ASCII"

Revision as of 14:07, 3 July 2020

Definition

The term ASCII stands for "American Standard Code for Information Interchange" and describes a character set standard for text data and information exchange. Character encodings (also called code pages) define how letters and other text characters, as well as control codes for data transfer, are to be encoded in bits and bytes.

The ASCII-code is one of the most important character set standards along with Unicode, ISO-8859-1 and Windows 1252. The code was the dominant standard for web pages for a long time until it was overtaken by the Unicode encoding UTF-8 in 2007. Nevertheless, it is still relevant today, as there are still areas where only characters contained in the ASCII code are allowed.

History

ASCII code has a long history. This history began with telegraphy and Morse code as well as the 5-bit Murray code developed by the New Zealand inventor Donald Murray between 1901 and 1932. The first version of the ASCII code was released in 1963 by ASA, the American Standards Association. ASA was a precursor of the American National Standards Institute (ANSI). In 1968, the version of the character set that is still valid today was published.

The 7-bit ASCII character set

Because of its history, the original ASCII code uses only seven bits of a common 8-bit byte and can encode a maximum of 128 different characters. The code contains upper and lower case letters of the English alphabet, the most important punctuation marks, mathematical symbols, and 33 control codes for data transfer and text formatting.

The following character groups are included:

0-32 and 127: control codes for data transfer as well as spaces, tabs, and line breaks
48-57: digits
65-90: capital letters
97-122: lower case letters
33- 7, 58-64, 91-96 and 123-126: punctuation marks, mathematical symbols, brackets, and other characters

Country-specific special characters such as umlauts and accents are not included in ASCII code.

Although the arrangement may seem chaotic and arbitrary, it results from thorough planning and consideration. The letters are positioned in such a way that upper and lower case versions of a letter only differ by a single bit. Numbers, spaces, and some other symbols are deliberately placed in front of the letters to simplify sorting. In addition, many non-alphanumeric symbols are located on positions similar to typewriter arrangements.

Extended character sets: ISO-8859

Since the actual ASCII code only contains the English alphabet, many region-specific extensions have been developed. In this context, the character sets Windows-1252 and ISO-8859-1 have gained particular importance. Both are 8-bit extensions of the original standard and contain many special regional characters. Due to historical developments, both standards are often referred to as ANSI character sets. Strictly speaking, however, this is not correct because ANSI has never officially standardized these character sets.

However, even with 8 bits, only 256 characters are available and therefore not all languages can be covered. Therefore, ISO-8859 has been developed as a collection of different character sets for different languages and regions. For example, ISO-8859-7 contains Latin and Greek alphabets, while ISO-8859-4 covers the special characters of Scandinavian and Baltic languages. ISO-8859-1 contains Western European alphabets and is almost identical to Windows 1252.

For both Windows 1252 and ISO 8859-1, the first 128 characters are identical to ASCII code. From position 128, code-specific special characters follow, whereby the numbers 128 to 159 are undefined in the ISO-8859 standard. Starting with number 160, the special characters of the different languages and regions are contained.

ASCII, Unicode, and UTF-8

Although the ISO-8859 standards cover many languages, not all languages are included. In addition, the different character sets caused a considerable amount of confusion because they are not compatible with each other. As early as 1988, the first plans for a uniform Unicode character set were drawn up, the first version of which was released in 1991.

Unicode enables the display of over a million characters and gradually replaces all other character sets. The Unicode encoding UTF-8, which now is the predominant text format on the World Wide Web, is particularly important. UTF-8 has the big advantage that it is ASCII compatible since the first 128 characters are identical.

Structure of ASCII and ISO tables

Usually, lists or tables are used to display the character sets in order to make the characters and their numerical values easy to find. These lists specify the characters and their decimal, hexadecimal, octal, and/or binary values.

Many tables are hexadecimal and separate the codes into the first and second half bytes. For example, the large H in the ASCII table is found in the 4th row of the 8th column, resulting in the hexadecimal notation 0x48. The carriage return CR has the code 0x0D because it is in line 0 and column D. 0x is a common prefix to refer to the hexadecimal notation.

ASCII Table

In the following, you can see the ASCII table with codes in decimal, hexadecimal and octal notation:

char.	decimal	hexadec.	octal
NUL	0	0x00	000
SOH	1	0x01	001
STX	2	0x02	002
ETX	3	0x03	003
EOT	4	0x04	004
ENQ	5	0x05	005
ACK	6	0x06	006
BEL	7	0x07	007
BS	8	0x08	010
HT	9	0x09	011
LF	10	0x0A	012
VT	11	0x0B	013
FF	12	0x0C	014
CR	13	0x0D	015
SO	14	0x0E	016
SI	15	0x0F	017
DLE	16	0x10	020
DC1	17	0x11	021
DC2	18	0x12	022
DC3	19	0x13	023
DC4	20	0x14	024
NAK	21	0x15	025
SYN	22	0x16	026
ETB	23	0x17	027
CAN	24	0x18	030
EM	25	0x19	031
SUB	26	0x1A	032
ESC	27	0x1B	033
FS	28	0x1C	034
GS	29	0x1D	035
RS	30	0x1E	036
US	31	0x1F	037

char.	decimal	hexadec.	octal
SP	32	0x20	040
!	33	0x21	041
"	34	0x22	042
#	35	0x23	043
$	36	0x24	044
%	37	0x25	045
&	38	0x26	046
'	39	0x27	047
(	40	0x28	050
)	41	0x29	051
*	42	0x2A	052
+	43	0x2B	053
,	44	0x2C	054
-	45	0x2D	055
.	46	0x2E	056
/	47	0x2F	057
0	48	0x30	060
1	49	0x31	061
2	50	0x32	062
3	51	0x33	063
4	52	0x34	064
5	53	0x35	065
6	54	0x36	066
7	55	0x37	067
8	56	0x38	070
9	57	0x39	071
:	58	0x3A	072
;	59	0x3B	073
<	60	0x3C	074
=	61	0x3D	075
>	62	0x3E	076
?	63	0x3F	077

char.	decimal	hexadec.	octal
@	64	0x40	100
A	65	0x41	101
B	66	0x42	102
C	67	0x43	103
D	68	0x44	104
E	69	0x45	105
F	70	0x46	106
G	71	0x47	107
H	72	0x48	110
I	73	0x49	111
J	74	0x4A	112
K	75	0x4B	113
L	76	0x4C	114
M	77	0x4D	115
N	78	0x4E	116
O	79	0x4F	117
P	80	0x50	120
Q	81	0x51	121
R	82	0x52	122
S	83	0x53	123
T	84	0x54	124
U	85	0x55	125
V	86	0x56	126
W	87	0x57	127
X	88	0x58	130
Y	89	0x59	131
Z	90	0x5A	132
[	91	0x5B	133
/	92	0x5C	134
]	93	0x5D	135
^	94	0x5E	136
_	95	0x5F	137

char.	decimal	hexadec.	octal
`	96	0x60	140
a	97	0x61	141
b	98	0x62	142
c	99	0x63	143
d	100	0x64	144
e	101	0x65	145
f	102	0x66	146
g	103	0x67	147
h	104	0x68	150
i	105	0x69	151
j	106	0x6A	152
k	107	0x6B	153
l	108	0x6C	154
m	109	0x6D	155
n	110	0x6E	156
o	111	0x6F	157
p	112	0x70	160
q	113	0x71	161
r	114	0x72	162
s	115	0x73	163
t	116	0x74	164
u	117	0x75	165
v	118	0x76	166
w	119	0x77	167
x	120	0x78	170
y	121	0x79	171
z	122	0x7A	172
{	123	0x7B	173
	124	0x7C	174
}	125	0x7D	175
~	126	0x7E	176
DEL	127	0x7F	177

ASCII Code, Unicode, and SEO

Although ASCII and ISO-8859 were the predominant text character standards for a long time, they are considered obsolete on the web today. The official standardization organization W3C (World Wide Web Consortium) recommends the exclusive use of UTF-8 as character encoding for all websites.

In addition to the actual texts on a web page, Unicode can also be used in meta descriptions. Unicode characters such as hooks, hearts, stars, envelopes or currency symbols can trigger unconscious impulses in the reader. For example, checkmarks and hearts generate approval, while envelopes and telephone symbols encourage contact. While this has no direct impact on search engine rankings, it increases click-through rates and leads to more visitors and customers.

In SEO-relevant keywords and keyword phrases, however, some restraint is advisable. Country-specific letters like umlauts and accents are no problem. However, unusual special characters, separating symbols, emoticons and pictograms can make keyword recognition impossible.

@@ Line 5: / Line 5: @@
 The term ASCII stands for "American Standard Code for Information Interchange" and describes a character set standard for text data and information exchange. [[Character Encoding|Character encodings]] (also called code pages) define how letters and other text characters, as well as control codes for data transfer, are to be encoded in bits and bytes.
-The ASCII-code is one of the most important character set standards along with Unicode, ISO-8859-1 and Windows 1252. The code was the dominant standard for web pages for a long time until it was overtaken by the Unicode encoding UTF-8 in 2007. Nevertheless, it is still relevant today, as there are still areas where only characters contained in the ASCII code are allowed.
+The ASCII-code is one of the most important character set standards along with [[Unicode]], ISO-8859-1 and Windows 1252. The code was the dominant standard for web pages for a long time until it was overtaken by the Unicode encoding UTF-8 in 2007. Nevertheless, it is still relevant today, as there are still areas where only characters contained in the ASCII code are allowed.
 == History ==