Jump to content

Problems with DVBViewer and RS unicode encoding


majstang

Recommended Posts

Hi guys!

 

I have lately been playing around quite extensively with RS EPG information files (.txt) and RS log files (.log). Got the idea to create some scripts which are abled to convert files from UTF-8 to ANSI and back again. When doing so i run into serious trouble trying to convert UTF-8 encoded files to ANSI. All UTF-8 encoded files created with other software on my computer, the conversionscripts handled very well. UTF-8 encoded files created by DVBViewer and Recording Service fails immediately when trying to convert them from UTF-8 to ANSI with the scripts. To solve the situation im forced to create an another script using a function that are abled to clean up badly encoded files and then convert the strings to ANSI. Using this script was the solution to my dilemma, but it shouldn't be needed if everything was working ok in DVBV & RS. Using my scripts on the .log file, which when created by RS is ANSI encoded. This file im abled to convert back and forth between ANSI and UTF-8 as many times as i please, without being forced to use the "Convert badly encoded UTF-8 to ANSI" script.

 

The question now is DVBViewer and RS really using standard UTF-8 encoding when creating the EPG information files (and every other UTF-8 encoded files)? It can of course be issues with Autohotkey, but i hardly think so when everything is fine running UTF-8 encoded notepad-, word- and excelfiles through the scripts.

 

To make it easy to test it yourselfs, i'll attach the scripts here. Use'em with Autohotkey_L (unicode) on 32-bit Win7 and drag and drop the files you wanna convert onto the script:

 

UTF-8 to ANSI

Loop %0%  ; For each parameter (or file dropped onto a script):
{
   GivenPath := %A_Index%  ; Fetch the contents of the variable whose name is contained in A_Index.
   Loop %GivenPath%, 1
   FileRead, UTF8, %GivenPath%
   ;msgbox % UTF8
   MyANSI := UTF8
   FileDelete, %GivenPath% 
   FileAppend, %MyANSI%, %GivenPath%, CP0
}

 

ANSI to UTF-8

Loop %0%  ; For each parameter (or file dropped onto a script):
{
   GivenPath := %A_Index%  ; Fetch the contents of the variable whose name is contained in A_Index.
   Loop %GivenPath%, 1
   FileRead, UTF8, %GivenPath%
   ;msgbox % UTF8
   MyANSI := UTF8
   FileDelete, %GivenPath% 
   FileAppend, %MyANSI%, %GivenPath%, UTF-8
}

 

Convert Badly encoded UTF-8 to ANSI

Loop %0%  ; For each parameter (or file dropped onto a script):
{
   GivenPath := %A_Index%  ; Fetch the contents of the variable whose name is contained in A_Index.
   Loop %GivenPath%, 1
   FileRead, UTF8, %GivenPath%
   ;msgbox % UTF8
   ConvertBadlyEncodedFormerlyUtf8StringFromAnsiToUtf8(UTF8)
   MyANSI := UTF8
   FileDelete, %GivenPath% 
   FileAppend, %MyANSI%, %GivenPath%, CP0
}

ConvertBadlyEncodedFormerlyUtf8StringFromAnsiToUtf8(ByRef string) {
   VarSetCapacity(ansi, StrPut(string, "cp0")), StrPut(string, &ansi, "cp0")
   string := StrGet(&ansi, "UTF-8")
} 

 

//majstang

:bye:

Link to comment
The question now is DVBViewer and RS really using standard UTF-8 encoding when creating the EPG information files (and every other UTF-8 encoded files)?

Yes they are really really standard UTF-8 encoded. :) Maybe you software relies on a BOOM at the beginning of the file which we don't use...

Link to comment

Aha, yes you are most certainly correct! Could be Autohotkey is requiring the BOM to work ok. Well, many thanks for your thoughts, ill have to read up on what to do with the scripts :)

Link to comment

Hmm...this Byte Order Mark (BOM) problem keeps on biting me in the a**! Since Recording Service does not use any BOMs on the EPG information file (.txt) and when trying to edit the .txt with Notepad, all hell breaks loose. What Notepad does is adding a BOM to the .txt when saving the file (in UTF-8 encoding). What seems to going on is when you refresh the recordings database and RS picks up the .txt, now with BOM, RS goes haywire in the sense this database entry gets strange characters all over.

 

Conclusion: I strongly advise against using Notepad for edit of RS UTF-8 encoded files. There is no way of see the BOM either in order to maunally remove it, cuz Notepad hides it efficiently. Same deal with Microsoft Word (also adds a BOM when saving file). Now i have to find a texteditor handeling this problem in a better way. Any recommendations on such a (free) texteditor, that do not add BOMs, would be greatly appreciated?

:bye:

Link to comment

Fantastic! All problems gone. Had to convert pretty much my whole database (all EPG information files), luckily not so many entries yet. I did edited the EPG information files, cuz many of the entries didn't have the series label. Doing it with Notepad was a disaster...well one can't know it all ;) Bad choice...should have done it through the webinterface edit function instead.

Anyway it can be done with Notepad++, which is a really good texteditor and easy to use. It has a "Encoding" menu where you select "Encode in UTF-8 without BOM", hit save and RS shouldn't give you any problems.

 

Thanks for your excellent help!

majstang

:bye:

Edited by majstang
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...