Jump to content

Teletext - Portuguese accented characters not shown


nars

Recommended Posts

Hi,

 

I have noticed that some accented characters are not shown in teletext of Portuguese (DVB-T) TV channels in DVBViewer. I did also noticed that if I right click the teletext Window there are some options on popup menu to change language, and that makes some differences, with Spanish setting I get some of the accented characters, but still not all of them, can I do something to help you in adding a Portuguese setting to show all our accented characters correctly?

 

Thanks for your very good work o:)

 

Regards,

 

nars

Edited by nars
Link to comment
  • 4 weeks later...

If this can help, here is a small ts stream (including teletext info) recorded from dvb-t:

http://nars.aloj.net/temp/dvbt-teletext.ts

 

and some screenshots showing the missing/correct characters:

http://nars.aloj.net/temp/dvbt-teletext1.png

http://nars.aloj.net/temp/dvbt-teletext2.png

http://nars.aloj.net/temp/dvbt-teletext3.png

http://nars.aloj.net/temp/dvbt-teletext4.png

 

Not sure if this can help, and also it's possible that these are not all the possible Portuguese accented characters missing... but if you are willing to add support to these as a Portuguese setting in a next beta then I will later download it and do a more "intensive" check to find any other ones missing.

 

Please let me know if you need something more.

 

Thanks.

Edited by nars
Link to comment

Teletext page 105 is the most interesting in your ts sample. It's about the character set:

 

- The white characters are from the Latin G0 Primary Set.

 

- The red characters are from the Latin National Option Subset for Portugal/Spain

 

So far it is covered by DVBViewer and the European teletext standard (as published by ETSI.org)

 

- The green characters - don't know.

 

Please tell me what is written on this page. Unfortunately I can't find anything about a Portuguese teletext character set it in the ETSI specifications. And without specifications it comes down to blind guessing and gets very time consuming...

Link to comment

Unfortunately page 105 is not as interesting as you think, it should be indeed... but it is not because it doesn't include all accented characters... in fact it doesn't include any of the missing characters on DVBViewer, I did checked that on my standalone TV (where I can see all characters correctly in all pages) and DVBViewer with Spanish setting shows page 105 exactly like I can see it on standalone TV (yes, including the 3 blank ones on columns 8 and 9). Then this page is completely useless unfortunately and I guess the only way would be to create a Portuguese setting with base on the Spanish one and then search for the few missing characters in other pages and I'm willing to help on such task.

 

Also note that there are no wrong characters at all with the Spanish setting (like with others), just a few ones missing but the ones that are shown are all correct, no swapped ones at all, that's why I say we could just use Spanish ones as base and then just add the ones missing to create the Portuguese setting.

 

From what I did saw there are not much more accented chars missing other than the ones I did pointed on the screenshots, I may be missing one or two more rarely used but not much more... most accented non capitals are already ok with the Spanish setting, except for the ones with ~ and ^, on capital ones seems there are some more missing with ´ accent as well as I saw on the ones I did pointed.

 

It's strange that there is no specification for such Portuguese accented characters as my standalone Thomson TV include a Portuguese setting that allows it to show all characters correctly. Also note that this problem is not specific to one tv channel but to all Portuguese channels.

 

I would not bother too much maybe if this only affected teletext itself but it also affects subtitles through and it's very annoying to be reading them in realtime with missing characters o:)

 

Thanks.

Edited by nars
Link to comment
Maybe something about it is at p. 109 and 111 in ETSI EN 300 706...

No. That's what I've already checked.

 

I've also checked the coding of teletext pages with missing Portuguese characters by saving them as ttx in DVBViewer (which means, the original page as it arrives in DVBViewer before it is rendered) and inspecting them with a hex editor. Guess what I've found on these character positions: Chacacter code 32 resp 0x20 in hex, or with other words, they are in fact transmitting blanks! The page doesn't contain information specifying what is supposed to be shown there, no control code, no character code intended for replacement by National Option Subset characters. Just blanks. See screenshot below.

 

That means, it's not a question of translating some character codes correctly, like the ones of the national subsets. There are no codes that can be translated. I guess there's something going on like DRCS (Dynamically Re-definable Character Set, pixel patterns resp. graphics for certain character positions are transmitted in a different way), and I must say, sorry, most likely this is beyond the DVBViewer scope...

Zwischenablage01.png

Link to comment

humm, I understand, in fact it was one of the first things I did, even before posting, was saving a teletext page but in txt and html (as I didn't knew what was .ttx format exactly) and looking at it in an hex editor, also found the 0x20's but thought there could be some filtering going in DVBViewer filtering "strange" characters and replacing them by 0x20, but now I guess that is not the case, and probably .ttx format is raw data from teletext, right?

 

Thanks for all your attention Griga :bounce:

Link to comment
but thought there could be some filtering going in DVBViewer filtering "strange" characters

No. ttx is the output of the teletext decoder. It's up to the teletext renderer to decide how characters are displayed.

 

There isn't much left for filtering anyway, because teletext is based on 7 bit character sets. One bit is used for odd/even parity checking (stripped by the decoder). Since teletext is a child of the analogue age, there is a lot of error protection, most of which is unnecessary in digital transmissions.

 

Characters #0..#31 are control codes used for switching between different colors or states (e.g. block graphics mode on/off), #32...#127 are character codes. In text mode 13 of them may vary according to the language.

 

And that brings me back to page 105 of your sample. The table contains character codes in the range of 0x80..0x9F (the green characters). They can not be transmitted as part of a normal teletext page. To me it looks like a list of replacement characters for Ã, Í etc. Replacing them by A, I etc. would not be correct, but more readable than just blanks. That is something DVBViewer could do. But where are these codes?

Link to comment
And that brings me back to page 105 of your sample. The table contains character codes in the range of 0x80..0x9F (the green characters). They can not be transmitted as part of a normal teletext page. To me it looks like a list of replacement characters for Ã, Í etc. Replacing them by A, I etc. would not be correct, but more readable than just blanks. That is something DVBViewer could do. But where are these codes?

 

humm, I understand, since it is 7bit then 7F would be the last char possible and not 9F, maybe they use some extra bit for that in some non-standard way or something? anyway my TV is rendering these green chars in page 105 exactly like DVBViewer, see:

http://nars.aloj.net/temp/tt_dvbv_105.png

http://nars.aloj.net/temp/tt_photo_105.jpg

 

but the same tv is able to render accented chars (really accented and not 'õ' as 'o' for example) that are missing on DVBViewer on all other pages, see:

http://nars.aloj.net/temp/tt_dvbv_100.png

http://nars.aloj.net/temp/tt_photo_100.jpg

 

Then how could my tv be using the "somehow special green chars" to show the "really accented" chars if it doesn't even show them "really accented" in the page 105? (hope you understand what I mean, my english is not very perfect as you probably already noticed :bounce: )

 

Also check this, it's teletext available in the channel web site:

http://www.rtp.pt/wportal/teletexto/index.php

(on the input field below type 105 and click OK button)

Curiously you will indeed see the green chars "really accented" here, but again if my TV doesn't show them like that in this page 105 then I guess it my be using some other way to render them as accented in the other pages.

Edited by nars
Link to comment

Maybe it's a good idea to call the telephone number on page 104 and ask :bounce: Tell them that you need the specifications for the transmission of accented characters, because you want to write teletext software. The German public broadcasters are quite cooperative in this respect, so that's what I would do here.

Link to comment

I'm quite sure that I've found the hidden location of the "missing characters" data now. It's data pack resp. invisible row number 26 (only rows 0..24 are displayed). From the specifications:

 

X/26/0 - 15: Accessing G0 characters with diacritical marks from the G2 set. A few characters from G2 and G3 sets, depending on local language requirements and Codes of Practice, may also be accessed.

 

The debugger tells me that these data packets are contained in the teletext. I'll investigate it...

Link to comment
  • 3 weeks later...

humm, looking good indeed ;)

 

Will we have a new beta with these changes soon? or can you just send me a compiled executable (or any other files needed) with that changes for testing? (and yes, I'm purchased user of DVBViewer obviously)

 

Many thanks for your work once again :blush:

Edited by nars
Link to comment

It's just an experimental implementation in DVBViewer GE, and there is more to be done, because the G2 character sets are different for different languages, which means, additional translation tables (for conversion to Windows character sets) are necessary. Up to now my implementation regards all languages as Western European, which may create confusion e.g. in Russian teletext.

 

Nevertheless I think that DVBViewer Pro/GE should be able to handle it, because it's teletext level 1.5 (quite basic), and my tests revealed that it also fixes missing/wrong characters on Spanish, French and Czech teletext pages (see screenshot, on the left the old version, on the right the new version). So when I'm through with it I'll recommend to Lars to implement it in DVBViewer Pro. However, it will take some time... I just wanted to let you know that there is some progress.

Zwischenablage01.png

Link to comment

I understand it may not be final stable implementation, anyway if you could send me that DVBViewer GE compiled binary exactly "as-is" with that changes I would really really appreciate it.

 

btw, I confess I was not aware what was GE version, I'm relatively new to DVBViewer and did never ever tried the GE version before or even bothered to read about it, but after seeing your post I did downloaded and tried it and found it also interesting, small and simple and also working perfectly with my DVB-T card ;)

Link to comment

Yes, I'm still "tuned" and a bit impatient to test it and to be able to see subtitles with no missing chars (it's really annoying to be guessing missing char's on subtitles at "real-time"...), doesn't matter to me whether it is final "super polished" version or beta or alpha version :) But don't wanted to annoy you any more... take your time and please if possible post here something when you upload it to the members area.

Edited by nars
Link to comment

Ok, DVBViewer GE 2.9.0.1 is available in the members area, beta section. It requires a DVBViewer GE 2.9.0 installation. Just replace DVBViewer.exe there.

 

The whole teletext code has undergone major changes and has to be tested, including .htm/.txt export (now as UTF-8 unicode), copying teletext pages to the clipboard and the search function. Please note: Your browser will only be able to display teletext pages exported as HTML completely if a teletext graphics font is installed (see members area, plugins section, teletext fonts).

 

Testers from other countries (particularly from Eastern European countries and Greece) are welcome. Please let me know if the teletext characters are displayed correctly. It may be necessary to right-click the teletext window and select the appropriate language, if auto-detection fails.

 

@nars: I can receive the Portuguese teletext via satellite by tuning to RTPI. Stupid me, it didn't come to my mind... DVBViewer GE is able to auto-detect the language here. Does it also apply to DVB-T?

Link to comment

Thanks!! I did just downloaded it and did some quick tests with it, apparently it's working all ok, all accented characters are being shown correctly with all PT channels on DVB-T, I will continue checking it and if I find any problem will let you know. Thanks once again :)

Link to comment

Did some tests on exporting pages to htm and txt and exported pages seem ok, all accented characters showing correctly on these exported files as well. Did used UltraEdit32 to open the exported txt and it detects it is UTF8 file and shows all ok.

 

Also tested the copy to clipboard and the search (include trying to search accented words) functions, found no problems on both as well.

 

Btw, a question: why those PC Speaker beeps every time we save a teletext page? :) I did already noticed it before on the Pro version as well... well I think it's a bit weird a program nowadays doing that... for a Save operation... also I have one of those piezoelectric mini pc-speakers on my motherboard that reproduces some high frequency beeps really loud, believe me I cannot save many pages at night or I would probably wake up my neighbours... ;) an Windows "ding" would be a better option IMO...

Link to comment
why those PC Speaker beeps every time we save a teletext page?

Ever since. Historical DVBViewer feature ;) E.g. if you save all pages (100..899) as HTML it will take quite a while, and somebody thought the user should be notified when it's finished.

 

Nevertheless I think it's better to remove that noise. A progress bar would be better, but I don't know if it's worth the effort...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...