Suppose you have a page with a <FORM> accepting user input. How can your results page script ensure that it has received the correct characters for all languages?
Text information typed into an input field in a <FORM> will be returned to the server using the page's encoding character set (charset). The same characters will be encoded in different ways for different charsets.
For example, the text "£10" for an input field called "Text" will be encoded as follows in charset "ISO-8859-1":
In the "UTF-8" charset, the following is sent:
where %XX is the hexadecimal string for a single character.
If you enter the non-Western characters 中文 into this field using the "ISO-8859-1" charset, then the following would be sent for these two Chinese characters:
Decoding the hex characters, you can see that the browser is using HTML character escape sequences, so this is equivalent to:
However, some charsets do not use this escape encoding, consequently some characters may not be sent to the server. A safe technique that ensures that all characters are sent is to use the "UTF-8" charset (Note that your entire page needs to use this charset.)
The browser does not provide any information about how it has encoded input information. The best you can do is to assume that the received data has been returned using the charset of the originating page. Some browsers allow a user to change the page encoding ... this will break your code.
Western ASP correctly decodes ISO-8859-1 encoded data, although non-Western characters sent as HTML character escape sequences are not decoded.
If the page with your <FORM> uses a different charset, then you must set the ASP codepage and the page charset to match. For UTF-8, the codepage must be set to 65001, eg:
<%@ CODEPAGE = 65001%>
Look here for a list of codepages for each charset: MSDN library reference.
See also Response.CodePage and Session.CodePage.
Be careful when displaying user input in case it is a security risk.
|Home Purchase Licenses Limitations Version details Site map Contact Copyright © 2000-2022 phdcc