Unicode normalization vulnerabilities

What is Unicode? Unicode or formally Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems. Representation For example, “A” is mapped to U+0041, and “a” is mapped to U+0061. Unicode characters exist from U+000000 to U+10FFFF (there are more than a million symbols). Unicode divides all these possible symbols into “planes”, the best known is the BMP (Basic Multilingual Plane) that goes from U+0000 to U+FFFF (it is the Unicode plane number 1, there are 16 more, called “astral planes”)....

September 30, 2021 · 4 min · Lazar