Jump to content

Chinese EPG not correctly displayed


allanlee

Recommended Posts

Posted

Using DVBViewer Pro 5.3.1.0 and TBS5881 DVB-C Tuner, to watch Singapore DVB-C FTA channels.

 

Some of the channels provides both English & Chinese EPG, while others only have English EPG.

 

The problem is all English EPG works perfect, however Chinese displayed as unreadable characters.

 

I've tried using the Tweaker to (1) on/off Convert EPG data to local character set; (2) on/off Convert EPG data to ISO6937; (3) on/off Use UTF16 instead of Big5. Unfortunately none combination works.

 

In the STB provided by carrier, EPG can display in Simplified Chinese.

 

I attached here the Exported EPG html files and support.zip.

 

P.S. TransEdit has the same problem.

 

post-152470-0-66453600-1456588686_thumb.png

 

Would very much appreciate if someone can help. Let me know if there's more info required to fix this problem.

 

Thanks a lot!

 

 

 

 

support.zip

 

 

Posted

If you have any problems you should tray the current DVBViewer version.

(update are available in the Members Section)

 

The Tweaker only effects newly revised EPG. So you should delete the configuration folder\epg.dat after each change.

 

If there is a problem TransEdit is the best tool to investigate this. But you should use Version 4.x Beta from the Members Section.

Try "Assume Unicode character coding in case of Big5" in the settings there dose it help in TransEdit?

http://www.DVBViewer.tv/forum/topic/2745-transedit/page-4#entry396436

Posted

Thanks!

 

Updated to 5.5.2.0, same problem.

 

Done the Tweaking again with epg.dat deleted every time, no luck.

 

Tried TE 4.1.0 beta, on/off Unicode & on/off ISO6937, all showed unreadable characters. (Same as in DVBViewer)

 

 

 

 

 

 

 

Is there anything else I can do to solve the issue, or help to fix the problem?

 

Many thanks again!

Posted

In TransEdit in the Analyzer window, on a transponder with channels which have EPG problems.

Make a right kick in the PID part and select "Select Main SI PIDs" this will select some PID (including EIT).

 

Then "Start Recording" those for 30 sec.and post the .ts file. Maybe that helps to determinate what is going on.

  • Like 1
Posted

I'm not able to determine which character encoding (UTF-8, UTF-16, Big5 etc.) the channels are using.

 

A developer has to look into it.

Posted

Is it possible to manually set decoding character set in DVBViewer / TransEdit? Then I can try & error to see which is the right one.

 

It's not likely to be Big5 because Big5 is for Traditional Chinese while the STB is displaying Simplified Chinese.

Posted

I've examined the sample. The broadcaster specifies no character set whatsoever in the EPG data. So according to the DVB specifications the default latin character coding has to be assumed (ISO 6937). That's causing the issue.

 

The broadcaster must flag his content correctly, there is no real good way around it. Every character string that is coded as Simplified Chinese must begin with a #19 control character (hex 0x13), see DVB specifications, ETSI EN 300 468, Annex A.2.

 

DVBViewer/TransEdit could try some guessing, e.g. assume Simplified Chinese if the ISO language code is specified as "chi". But that may have unwanted side effects in other situations.

 

Is it possible to manually set decoding character set in DVBViewer / TransEdit?

 

Sorry, no. It can be considered for future DVBViewer versions, but not as a short-term solution.

Posted (edited)

Thanks for your reply, Griga.

 

Some Linux-based receivers are able to decode Chinese EPG correctly (e.g. DM800Se SR4). As well as VLC - when I try to stream the channel via LAN the embedded EPG info could be decoded.

 

Some further study & check showed that the character seems to be UTF-8. Could you (or Tjod) help to confirm? If this is the case, by any chance there could be a temperary solution to correctly display the EPG info before future versions enabling manually specifying the character set? (i.e. Add another option in the Tweaker?)

Edited by allanlee
Posted
Some further study & check showed that the character seems to be UTF-8. Could you (or Tjod) help to confirm?

 

No UTF-8, as far as I can see. Does this look correct? It's from Channel 8 HD:

Zwischenablage01.png

Posted

No, that's unreadable... GB2312 does not seem to fit.

 

According to Annex A, maybe it should be 0x15 heading?

 

Thank you very much!

Posted

You are right, it is UTF-8.

 

I tried an automatic UTF-8 detection, but wrongly on first attempt. What I've added now is

 

IF no_character_set_specified AND (language = chi) AND utf-8_auto_detected THEN assume_utf-8

 

However, the automatic UTF-8 detection is a kind of good guessing, which means, it's not 100% fail-safe. Other character coding may be mistaken for UTF-8, causing trouble in other cases, though it is not very likely. We can try to use it in the next DVBViewer release (which will come soon), but if half of China complains about garbled characters we have done the wrong thing ;)

Posted (edited)

Thank you soooo much! That seem to be good solution so far.

 

Will this guessing also apply to recording service? (which I'm also using)

 

As far as I know, most broadcaster in mainland China use GB2312, tag as "zho" and current version of DVBViewer works fine.

 

If necessary I can get you some samples from my friends in various Chinese cities.

Edited by allanlee
Posted
Will this guessing also apply to recording service?

 

Yes, in the next RS release.

 

As far as I know, most broadcaster in mainland China use GB2312, tag as "zho"

 

In my samples they use "chi" plus character set designation (e.g. Shen Zhen Cable, UTF-16). Same applies to Taiwan DVB-T.

  • 2 weeks later...
Posted

Tested with v5.6.0

 

The detection algorithm partially works. Text in red box are garbled - seems not detected as UTF8.

 

Will it be a good idea to assume UTF8 for all EPG data in the same channel (or even the whole broadcaster) if, let say >80% of EPG data fields, are detected as UTF8?

 

Sample ts here and here in case you need the raw EIT stream.

 

Thanks for the efforts again!

 

 

iJUusZq.png

 

mBjjo81.png

Posted

There are strings that are not valid UTF-8, e.g. the following picked from the debugger and handled as UTF-8 by Notepad++:

 

《中国新闻》是以海外华人、港澳台同胞、留学生、驻外使领馆及中资机构人员为目标的新闻节目。节目由国内外要闻、内地经济和社会新闻、对国内外重要新闻xE4xBA

 

with invalid codes displayed as hexadecimal numbers. This lets the UTF-8 detection fail because it checks for valid UTF-8.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...