Jump to content

Incorrect HTML encoding (eg. ø) of Czech characters


kabaldan

Recommended Posts

EPG data are displayed by the Recording Service (v1.5.0.31) web interface with incorrect HTML encoding:

 

Latin Small Letter R with caron (U+0159) is encoded as ø while the correct HTML encoding would be & #345; (without the space). [ø is completely different character - Latin Small Letter O with stroke (U+00F8).]

Latin Small Letter E with caron (U+011B) is encoded as ì while the correct HTML encoding would be & #283;.

 

This list is incomplete and could include most Czech characters with carons and rings (see http://www.thesauruslex.com/typo/eng/enghtml.htm#cz).

 

What is interesting: the Latin Small Letter S with caron (U+0161) and the Latin Small Letter Z with caron (U+017E) are displayed without HTML encoding. When you change the lngCharset to the default Czech system charset Windows-1250, those two characters are displayed correctly (unlike those that use the HTML encoding).

 

Therefore I would suggest to not use the HTML encoding for foreign characters, because if they are converted and encoded incorrectly, their display can not be corrected by a change of charset in the web browser.

 

Of course, the best final solution would be to use the UTF-8 encoding in all Recording Service code.

Link to comment
  • 7 months later...
×
×
  • Create New...