Jump to content

diacritics problem on epg guide


bogdan1980

Recommended Posts

I see a problem with wrong diacritics that appear on epg menu in DVBViewer.

win 10 , locale set to romanian.

Untitled.jpg

Link to comment

It depends on the language and character table indicated in the EPG data. Both are displayed by the TransEdit Analyzer:

 

Zwischenablage01.png

 

Please check it for the channels in question. Use "Find" for searching the tree data if you want to find an EPG entry. Does TransEdit display the EPG correctly?

Link to comment

the same.

On cinemax channel we have also hungarian audio and subtitles, years ago sometimes we haved hungarian instead of romanian epg guide( also audio in hungarian).

But this problem is fixed now. Its diacritics problem.

 

Maybe its because we have rum and ron in epg settings . Need to test them both.

 

 

EDIT: the same in progdvb.

 

 

Capture.JPG

Capture2.JPG

Capture3.JPG

Edited by bogdan1980
Link to comment

There is no character table indication, which means, the ISO 6937 diacritics character set applies as default according to the DVB specifications. However, DVBViewer can't handle it this way throughout because several Western European broadcasters don't comply with this rule (shame on them), in contrast to most Eastern European broadcasters. That's why DVBViewer checks if it's an Eastern European language.

 

The following languages are assumed to use ISO 6397 if no other character table is specified: hun, hrv, cze, slo, ces, slv, plk, rom, ron

 

rum is missing here. Do you think it should be added?

Link to comment

P.S. DVBViewer provides a "Convert EPG data to the ISO 6937 character set" tweak (-> launch Tweaker.exe) that you may want to try. It lets DVBViewer convert all EPG data without character set information to the ISO 6937 character set. However, it will only take effect on new EPG data, so it may be necessary to close DVBViewer and delete the already received EPG data in the file epg.dat (see configuration folder).

Link to comment

i enable convert epg data to the iso 6937. Need more time to see if its working.

 

Link to comment

In your screenshot above you have marked an item "Pe campuri de cap?uni". The question mark replaces a character that can't be handled. Which character should appear there?

 

Unfortunately the screenshot doesn't show the corresponding language and character table indication because you didn't expand the item. What was it? That's what I need to know.

Link to comment

ş instead of ?

 

And here instead of ?coala de rock is : şcoala de rock.

Our language is old latin with some slavic in it. We have ş ţ ă î .

 

What is curios is that on our channels some report rum subtitles, other report ron subtitles, or rom.

Or even md(moldavian) who is also romanian language, same people etc.

Maybe because rum ron rom md have this problems/?!

 

I see RON in language.

Capture.JPG

Edited by bogdan1980
Link to comment

Hmm, "ron" lets DVBViewer apply the ISO 6937 character table. Maybe the issue is caused by

 

Quote

Also, some diacritics used with the Latin alphabet like the Romanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

 

https://en.wikipedia.org/wiki/ISO/IEC_6937

 

Is it always ş that appears as ? or are other characters also affected?

 

DVBViewer is able to handle Ş and ş (S with cedilla), which means it translates the ISO 6937 two byte character codes 0xCB 0x53 and 0xCB 0x73 to the corresponding Unicode character codes 0x015E and 0x015F (see here).

 

It would be interesting to know which hex codes the broadcaster is using. This requires right-clicking the EIT stream on the right side of the Analyzer Window -> Hex View -> Set Packets to a high value like 5000 -> Restart -> wait a bit -> use the Find function -> Text String to find the string (an ASCII part of it or a sequence of ASCII characters nearby) in order to examine the corresponding bytes as hex codes.

Link to comment
4 hours ago, bogdan1980 said:

ş is BA.

ţ is FE.

 

No, it doesnt't work this way. The Hex Viewer knows nothing about DVB character coding. On the right side it just uses the local Windows character set for displaying bytes as characters, so you can see if there is some text in the TS packets. What you see there as ş or ţ is meaningless.

 

That's why I have written:

 

On 4.1.2018 at 3:58 PM, Griga said:

an ASCII part of it or a sequence of ASCII characters nearby

 

because that's all you can find with the Hex Viewer. What you need to do is

  1.  Find a word or sentence with a ? in the EPG, e.g. "Pe campuri de cap?uni"
  2. Search in the Hex Viewer for a part nearby that only contains ASCII characters (no special characters!), e.g. "campuri de cap"
  3. Check how the part where the ? came in has been coded by the broadcaster as hexadecimal bytes. The two (!) byte hex code "CB 73" for ş would be correct ISO 6937.
Link to comment

Well, secret disclosed. :D The ? (ASCII code 3F or decimal  63) is already in the broadcasted EPG data. The broadcaster is in trouble with character translation, not DVBViewer.

 

It should be "Mărturii şi evocări", but ş is replaced by ?. This usually happens if Unicode characters are translated to another character table. If the Unicode character is not present in this table,  a question mark is inserted. At least Windows does it this way.

 

So all you can do is to write a nice letter to the broadcaster and ask him to fix his Unicode -> ISO 6937 character translation.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...