Jump to content
jirim100

Sort order of Search presets

Recommended Posts

jirim100

Please, is it possible to sorting of Search presets in locale of current windows pc??

 

On the following screenshot are badly sorted Search presets on PC with installed czech windows (letters with diacritics).

image.thumb.png.36802870fdf8dbf96b0af9d847d56efd.png

Share this post


Link to post
jirim100

I have many search presets and when aren't alphabetically sorted then is for me difficult to find one  which I need to adjust.

Share this post


Link to post
HaraldL
Posted (edited)

They are always sorted alphabetically here. But I don't have special non-west-european chars like you which are inserted between K and L in your list.

 

You could try to change the preset names with leading numbers to i.E. "01 ...", "02 ..." and save them again.

 

Btw, if you move your mouse over the leading geen/gray bubble in front of a timer in the timer list without clicking then a tooltip appears telling you which search preset created this timer. Would then show like "01 ..." too.

Edited by HaraldL

Share this post


Link to post
jirim100
Posted (edited)
56 minutes ago, HaraldL said:

They are always sorted alphabetically here. But I don't have special non-west-european chars like you which are inserted between K and L in your list.

 

You could try to change the preset names with leading numbers to i.E. "01 ...", "02 ..." and save them again.

 

Btw, if you move your mouse over the leading geen/gray bubble in front of a timer in the timer list without clicking then a tooltip appears telling you which search preset created this timer. Would then show like "01 ..." too.

Yes, alphabetically - but only for english world. But not for czech, slovakia, polish etc.

 

Numbering search presets by leading number is bad idea. In most cases is better set search preset name the same as searched text. All professional applications (and non professional too) normally support sorting by locale.
I don't need to show which search preset created the concrete timer, but I very ofter need to edit parameters (or delete) of the concrete "search preset" and for this is very useful sorting by name.

 

I have about 400 search presets.

Edited by jirim100

Share this post


Link to post
Griga

I've looked it up in the code, and I think it must be called a bug. Comparing and sorting the preset names is done by a routine designed for the local ANSI character set, but in fact the names are UTF-8.

 

@jirim100 Please check your PMs...

Share this post


Link to post
Griga

I've replaced the wrong ANSI string comparison by a case-insensitive comparison that is suitable for UTF-8. It is a library function that is used in the Media Server throughout for this purpose.

 

However, as reported by @jirim100, it is not sufficient for handling all Czech characters correctly. The reason for it is that this function is optimized for speed. It uses a translation table for converting letters with diacritics to uppercase ASCII letters when comparing strings. This table is limited to the first 256 Unicode characters (up to U+00FF, being identical with the ISO/IEC 8859-1 code page). Strings containing Unicode characters outside this range are not sorted correctly.

 

The only complete solution I can see at the moment is to convert UTF-8 strings to UTF-16 and use a Windows API function for comparing them. It covers a much wider Unicode range and considers linguistic rules. However, presumably this proceeding will make the UTF-8 string comparion and everything using it much slower. I'm not sure about the impact that it will cause if the DMS has to handle large data collections, e.g. if users have thousands of media files.

 

Share this post


Link to post
jirim100
Posted (edited)
3 hours ago, Griga said:

The only complete solution I can see at the moment is to convert UTF-8 strings to UTF-16 and use a Windows API function for comparing them. It covers a much wider Unicode range and considers linguistic rules. However, presumably this proceeding will make the UTF-8 string comparion and everything using it much slower. I'm not sure about the impact that it will cause if the DMS has to handle large data collections, e.g. if users have thousands of media files.

Is CompareStringEx on Windows 7? Maybe you should use only CompareStringW (I have Windows 7). Or both 😀 functions - one for Win 7 and one for Win Vista and later.

 

In any case sorting now is not acceptable. For example letter Č can't be at the end of list. Even when some operations will be slower, for me it is more acceptable than the current state.

And - one big argument - your Media Server is used in many countries on the whole world. You have to 😀 change this behavior. Yours Media Server then will be more valuable. I think, database servers have to use these functions too and are fast enough.

 

I can do for you speed test (I have Win 7 64bit). In my database is about 19000 created recordings (from last 6 years).

Edited by jirim100

Share this post


Link to post
Griga

Another (fast) possibility is extending the translation table that is used for conversion to ASCII letters.

 

For the following see here: Currently Basic Latin and Latin Supplement are covered. Latin Extended A and B could be added. The result would contain all European letters with diacritics, as far as I can see. Or is there something missing, regarding the Czech character set?

 

However, creating the extended table is some work to do. Country specific sorting rules are not included. It just means replacing letters with diacritics by matching uppercase ASCII letters and using the result for sorting. And it won't work in countries that are using other letters  than Latin.

 

On the other side, it will be sufficient for the majority of users. Why should they be restrained by a slow method they don't need? Maybe it would be good to make the method configurable.

 

Share this post


Link to post
jirim100
Posted (edited)
1 hour ago, Griga said:

For the following see here: Currently Basic Latin and Latin Supplement are covered. Latin Extended A and B could be added. The result would contain all European letters with diacritics, as far as I can see. Or is there something missing, regarding the Czech character set?

Latin Supplement + Latin Extended A covers Czech character set (and probably Slovakia, Polish...).

 

 

I didn't any speed test regarding functions CompareString/CompareStringEx, but why should be these functions too slow? As one parameter of these functions is Locale identifier, I suppose these function firstly convert compared strings through internal table for current locale (it's very fast) and then simply compare by value. And locale identifier you can get from operating system or from EPG settings -> Preferred language.

 

Edited by jirim100

Share this post


Link to post
Griga
vor 14 Minuten schrieb jirim100:

I didn't any speed test regarding functions CompareString/CompareStringEx, but why should be these functions too slow?

 

The additional UTF-8 -> UTF-16 conversion per string per comparison will slow it down. It is unavoidable to do it again and again ;) UTF-16 ist the native Windows string format and UTF-8 the preferred web application string format... all DMS databases contain UTF-8 strings. The DMS has to provide an external COLLATE SYSTEMNOCASE UTF-8 string comparison function for SQLite, that only supports case-insensitive comparisons of ASCII characters (see here).

 

Share this post


Link to post
Griga
vor 17 Stunden schrieb jirim100:

Is CompareStringEx on Windows 7?

 

Zitat
Minimum supported client Windows Vista [desktop apps | UWP apps]

 

vor 17 Stunden schrieb jirim100:

Maybe you should use only CompareStringW

 

For Windows XP compatibility. There are still some people using it. I've linked to CompareStringEx because the documentation contains a description of the possible values for dwCmpFlags. There is an interesting difference between LINGUISTIC_IGNORECASE and NORM_IGNORECASE:

 

Zitat

LINGUISTIC_IGNORECASE Ignore case, as linguistically appropriate.

NORM_IGNORECASE Ignore case. For many scripts (notably Latin scripts), NORM_IGNORECASE coincides with LINGUISTIC_IGNORECASE.

 

Remarks

Both CompareString and CompareStringEx are optimized to run at the highest speed when dwCmpFlags is set to 0 or NORM_IGNORECASE (...) and the locale does not support any linguistic compressions, as when traditional Spanish sorting treats "ch" as a single character.

 

NORM_IGNORECASE seems to be similar to the DMS translation of letters with diacritics to matching uppercase ASCII letters.

 

There is another PM...

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...