1.Previous
3.Next

characters effectively have more than one corresponding byte sequence: the valid one and some invalid ones. This could lead to security issues similar to the one described in the previous section. Therefore, if one wants to apply some filter (e.g., a regular expression written in UTF-8) to UTF-8 strings that will later be passed to a decoder that allows invalid byte sequences, one should canonicalize the strings before passing them to the filter. In this context, canonicalization is the process of translating every string character to its single

1.Previous
3.Next