allanlee Posted February 27, 2016 Posted February 27, 2016 Using DVBViewer Pro 5.3.1.0 and TBS5881 DVB-C Tuner, to watch Singapore DVB-C FTA channels. Some of the channels provides both English & Chinese EPG, while others only have English EPG. The problem is all English EPG works perfect, however Chinese displayed as unreadable characters. I've tried using the Tweaker to (1) on/off Convert EPG data to local character set; (2) on/off Convert EPG data to ISO6937; (3) on/off Use UTF16 instead of Big5. Unfortunately none combination works. In the STB provided by carrier, EPG can display in Simplified Chinese. I attached here the Exported EPG html files and support.zip. P.S. TransEdit has the same problem. Would very much appreciate if someone can help. Let me know if there's more info required to fix this problem. Thanks a lot! support.zip Quote
Tjod Posted February 27, 2016 Posted February 27, 2016 If you have any problems you should tray the current DVBViewer version. (update are available in the Members Section) The Tweaker only effects newly revised EPG. So you should delete the configuration folder\epg.dat after each change. If there is a problem TransEdit is the best tool to investigate this. But you should use Version 4.x Beta from the Members Section. Try "Assume Unicode character coding in case of Big5" in the settings there dose it help in TransEdit? http://www.DVBViewer.tv/forum/topic/2745-transedit/page-4#entry396436 Quote
allanlee Posted February 27, 2016 Author Posted February 27, 2016 Thanks! Updated to 5.5.2.0, same problem. Done the Tweaking again with epg.dat deleted every time, no luck. Tried TE 4.1.0 beta, on/off Unicode & on/off ISO6937, all showed unreadable characters. (Same as in DVBViewer) Is there anything else I can do to solve the issue, or help to fix the problem? Many thanks again! Quote
Tjod Posted February 27, 2016 Posted February 27, 2016 In TransEdit in the Analyzer window, on a transponder with channels which have EPG problems. Make a right kick in the PID part and select "Select Main SI PIDs" this will select some PID (including EIT). Then "Start Recording" those for 30 sec.and post the .ts file. Maybe that helps to determinate what is going on. 1 Quote
allanlee Posted February 27, 2016 Author Posted February 27, 2016 Here it is. https://drive.google.com/file/d/0By_acVg9jls5VHdDbWpxNG5DaGs/view?usp=sharing https://mega.nz/#!jBsnDQDB!vTswneszTf0iaN26NVRlilbkHaakwSPuPy29IFkwG9o Hope it helps. Let me know if more samples needed. Many thanks! Quote
Tjod Posted February 27, 2016 Posted February 27, 2016 I'm not able to determine which character encoding (UTF-8, UTF-16, Big5 etc.) the channels are using. A developer has to look into it. Quote
allanlee Posted February 28, 2016 Author Posted February 28, 2016 Is it possible to manually set decoding character set in DVBViewer / TransEdit? Then I can try & error to see which is the right one. It's not likely to be Big5 because Big5 is for Traditional Chinese while the STB is displaying Simplified Chinese. Quote
Griga Posted February 28, 2016 Posted February 28, 2016 I've examined the sample. The broadcaster specifies no character set whatsoever in the EPG data. So according to the DVB specifications the default latin character coding has to be assumed (ISO 6937). That's causing the issue. The broadcaster must flag his content correctly, there is no real good way around it. Every character string that is coded as Simplified Chinese must begin with a #19 control character (hex 0x13), see DVB specifications, ETSI EN 300 468, Annex A.2. DVBViewer/TransEdit could try some guessing, e.g. assume Simplified Chinese if the ISO language code is specified as "chi". But that may have unwanted side effects in other situations. Is it possible to manually set decoding character set in DVBViewer / TransEdit? Sorry, no. It can be considered for future DVBViewer versions, but not as a short-term solution. Quote
allanlee Posted February 28, 2016 Author Posted February 28, 2016 (edited) Thanks for your reply, Griga. Some Linux-based receivers are able to decode Chinese EPG correctly (e.g. DM800Se SR4). As well as VLC - when I try to stream the channel via LAN the embedded EPG info could be decoded. Some further study & check showed that the character seems to be UTF-8. Could you (or Tjod) help to confirm? If this is the case, by any chance there could be a temperary solution to correctly display the EPG info before future versions enabling manually specifying the character set? (i.e. Add another option in the Tweaker?) Edited February 28, 2016 by allanlee Quote
Griga Posted February 28, 2016 Posted February 28, 2016 Some further study & check showed that the character seems to be UTF-8. Could you (or Tjod) help to confirm? No UTF-8, as far as I can see. Does this look correct? It's from Channel 8 HD: Quote
allanlee Posted February 28, 2016 Author Posted February 28, 2016 No, that's unreadable... GB2312 does not seem to fit. According to Annex A, maybe it should be 0x15 heading? Thank you very much! Quote
allanlee Posted February 28, 2016 Author Posted February 28, 2016 Yes!!! How did you make it in TE? Quote
Griga Posted February 28, 2016 Posted February 28, 2016 You are right, it is UTF-8. I tried an automatic UTF-8 detection, but wrongly on first attempt. What I've added now is IF no_character_set_specified AND (language = chi) AND utf-8_auto_detected THEN assume_utf-8 However, the automatic UTF-8 detection is a kind of good guessing, which means, it's not 100% fail-safe. Other character coding may be mistaken for UTF-8, causing trouble in other cases, though it is not very likely. We can try to use it in the next DVBViewer release (which will come soon), but if half of China complains about garbled characters we have done the wrong thing Quote
allanlee Posted February 28, 2016 Author Posted February 28, 2016 (edited) Thank you soooo much! That seem to be good solution so far. Will this guessing also apply to recording service? (which I'm also using) As far as I know, most broadcaster in mainland China use GB2312, tag as "zho" and current version of DVBViewer works fine. If necessary I can get you some samples from my friends in various Chinese cities. Edited February 28, 2016 by allanlee Quote
Griga Posted February 29, 2016 Posted February 29, 2016 Will this guessing also apply to recording service? Yes, in the next RS release. As far as I know, most broadcaster in mainland China use GB2312, tag as "zho" In my samples they use "chi" plus character set designation (e.g. Shen Zhen Cable, UTF-16). Same applies to Taiwan DVB-T. Quote
allanlee Posted March 10, 2016 Author Posted March 10, 2016 Tested with v5.6.0 The detection algorithm partially works. Text in red box are garbled - seems not detected as UTF8. Will it be a good idea to assume UTF8 for all EPG data in the same channel (or even the whole broadcaster) if, let say >80% of EPG data fields, are detected as UTF8? Sample ts here and here in case you need the raw EIT stream. Thanks for the efforts again! Quote
Griga Posted March 10, 2016 Posted March 10, 2016 There are strings that are not valid UTF-8, e.g. the following picked from the debugger and handled as UTF-8 by Notepad++: 《中国新闻》是以海外华人、港澳台同胞、留学生、驻外使领馆及中资机构人员为目标的新闻节目。节目由国内外要闻、内地经济和社会新闻、对国内外重要新闻xE4xBA with invalid codes displayed as hexadecimal numbers. This lets the UTF-8 detection fail because it checks for valid UTF-8. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.