News:

Herr Otto Partz says you're all nothing but pipsqueaks!

Main Menu

Bliss / Cas-Stunts track editor

Started by Cas, March 08, 2015, 01:16:12 AM

Previous topic - Next topic

Cas

Thank you, Afullo!  You've got a great eye on typos!  I've already fixed the two tings. Next release, they won't be there.
Earth is my country. Science is my religion.

Cas

It's been a long time since I last made changes on Bliss. It's pretty much complete, but recently, I've been thinking that there were a few things that had been in demand for long and I feel capable of implementing them now. One of them is UTF-8 support. This is a work in progress, but I wanted to tell you I'm doing it and to give you an idea of the difficulties.

So far, Bliss has only worked with plain ASCII. Only 128 characters and control ones are not valid for metadata anyway. Now I would like by default all text that Bliss saves or reads to be UTF-8. I can do it, but I am encountering the following problems:

- I'm having to deal with four encoding types at the very minimum. Why?  Because FreeBasics graphics by default use a 8-bit font based con CP437, but I want to read and save UTF-8, which is variable character with, so it's difficult to handle internally, reason why I'll use UTF-32 inside the program. To make things worse, the window title in X or GTK, I don't know, seems to be ISO-8859-15 for some reason, which is yet another 8bit encoding, so if I want the track title to appear on the window title, I will have to use that as well.
- This ISO-8859-15 is specific to certain languages, so I expect that, depending on the language you choose for the installation of your distro, it'll change, so I can't even rely on it. Besides, I don't know what encoding Windows will use. So I'll probably have to treat this encoding as ASCII and post all non-ASCII characters as question marks or something.
- UTF-8 has some tricky cheats having to do with bits on and off that are very smart and made it super efficient and highly compatible with old code pages. Yet, these things also make it very easy to make a mistake when programming the conversion routines. I am trying to be as careful as possible.
- To display what the strings contain, I'm still having to use that CP437, but the languages we use in the community have characters that are not present in that code page, like most uppercase letters with diacritics for Spanish and Portuguese and like the double-accented "o" and "u" of Hungarian. I will have to start by replacing the rendering of these characters with something similar (i.e.: the non-diacritical version of them), but later, I will have to make my own font-renderer so that these characters can be seen.

I don't expect all of you will be interested in as much detail, but maybe you are... and it's good to document this here in case I want to read it later on or it could help understanding the source code of Bliss in the future. Anyway, opinions are welcome!
Earth is my country. Science is my religion.

afullo

For Western languages, Windows may use also CP1252. By the way, they do exist test sentences in Hungarian using all the special wovels, like these ones.

dreadnaut

I would say "stay away from UTF-8", unless you have a library that deals with it or a lot of spare time and nothing else to do :)

Cas

#124
Quote from: afullo on May 30, 2020, 01:26:27 AM
By the way, they do exist test sentences in Hungarian using all the special wovels, like these ones.

In English, there is a well-known sentence that contains all leters only once each:
The quick fox jumped over the lazy dog

In Spanish, we don't have such, or not that I know of, but we do have a more or less known palindrome sentence that goes like this:
Dábale arroz a la zorra, el abad
It means more or less: Giving rice to the (female) fox, was the monk

It'd be interesting to get to know more things like those in other languages.

Quote from: dreadnaut on May 30, 2020, 10:52:51 PM
I would say "stay away from UTF-8", unless you have a library that deals with it or a lot of spare time and nothing else to do :)

Oh, I've already delved into UTF-8 when I was making Fodix, the data analyser. I got it to work well, but in this case, I have to do a few more things that are trickier. That's fine, I'm a low-level programmer. Not a great one, but I'm one. I don't find it easy, but I enjoy dealing directly with the actual thing going on in the background. That's what originally got me into programming :)  Don't worry, it'll work :P

Ah, and another thing I'm adding to the next version of Bliss (and, as a matter of fact, that part is already complete) stems from a request you made long back, Dreadnaut. I added a text meta-data format... and it can both read and write!  I had to do text binary parsing, which I'm not a fan of, but the result is good. So, when you save a track in one-file mode (with overlaid meta-data), that's still binary, but if you split it, you can choose the meta-data file to be text or binary. I think that, with this available, it doesn't make much sense to use binary external meta-data anymore, but I kept it available for compatibility and so that you can do things like spliting a file with another program and then it still be useful or just appending metadata directly. In one-file format, there wouldn't be a reason to make it text, because the first part would be binary anyway (the track itself).

The feature works well. Only thing is it will fail if you include a BOM at the beginning of the file and it's exclusively UTF-8. No support for UTF-16 and other encodings. The files produced are always CRLF in line-ending, but it can read any line ending. It can't tell whether there are empty lines in the file, but that's not necessary for track metadata. I'll soon put the new version online for you guys to test if you want.
Earth is my country. Science is my religion.

dreadnaut

Quote from: Cas on May 31, 2020, 09:19:26 PM
I added a text meta-data format... and it can both read and write!  I had to do text binary parsing, which I'm not a fan of, but the result is good. So, when you save a track in one-file mode (with overlaid meta-data), that's still binary, but if you split it, you can choose the meta-data file to be text or binary.

Oooh, I like that! Now, I see why you want to support UTF-8, nice ;)

afullo

Quote from: Cas on May 31, 2020, 09:19:26 PM
Quote from: afullo on May 30, 2020, 01:26:27 AM
By the way, they do exist test sentences in Hungarian using all the special wovels, like these ones.

In English, there is a well-known sentence that contains all leters only once each:
The quick fox jumped over the lazy dog

In Spanish, we don't have such, or not that I know of, but we do have a more or less known palindrome sentence that goes like this:
Dábale arroz a la zorra, el abad
It means more or less: Giving rice to the (female) fox, was the monk

It'd be interesting to get to know more things like those in other languages.

In Italian:

I topi non avevano nipoti (The mice had not nephews/nieces [or grandchildrens, it is used for both])
Ai lati d'Italia (At sides of Italy)

Cas

Quote from: dreadnaut
Oooh, I like that! Now, I see why you want to support UTF-8, nice ;)

Oh, I actually had thought of the two features separately. That is, even without adding support for UTF-8, the text format does naturally allow that encoding... only that inside Bliss, it'd look like gibberish if you enter extended characters, but now that you say that, yes... it's another good reason for implementing it!  I actually remember that at some point, somebody asked me about special characters in Bliss. The text format already works very well.


Quote from: afullo
I topi non avevano nipoti (The mice had not nephews/nieces [or grandchildrens, it is used for both])
Ai lati d'Italia (At sides of Italy)

Ah!  Those are very good ones :D
Earth is my country. Science is my religion.

Cas

Giving you guys and update. I've successfully managed to get UTF-8 working. In the Bliss environment, it renders to CP437 and on the window title, it renders to Latin-1 (a.k.a. CP850 or ISO-8859-1). I was mistaken when I identified it at ISO-8859-15, which is identical to Latin-1 except for the fact that it includes the Euro sign replacing the currency sign. Of course, only a bunch of characters are rendered in the Bliss environment. I've concentrated on diacritical marks. Because the code page is limited, some are rendered as their non-diacritical counterparts, but at least, they are recognised and shown. When files are saved, the actual UTF-8 code point is stored.

I still expect some discrepancies to appear given the fact that different regions configure the system with different code pages (we're still not 100% in the Unicode era, it seems), but since most characters we use are ASCII, that'll be fine. For Hungarian, I made sure to recognise the O and U with double-acute accent, which are not included in Latin-1. Other Hungarian letters can be represented with Latin-1 indirectly. Of course, CP437 does not contain these letters, so they will appear as Ö and Ü respectively, but when stored, the correct UTF-8 code point will be used. I plan on retouching the font later on so these characters are all properly displayed.

I will soon be posting the new version. I have another smaller feature to add.
Earth is my country. Science is my religion.

Cas

Bliss 2.5.5 is out... or almost!

I've uploaded a ZIP file that includes the executables of Bliss 2.5.5 for DOS, GNU/Linux (64bit only) and Windows 32bit. Data files are not included as I've made no important change to them, so you can test the new version by just dropping the executables on top of the previous version. If this works well to you guys, I'll make a full package. The ZIP is loaded at the first post in this topic.

I suggest you that you guys keep the old executable just in case, as I've only been testing the GNU/Linux version thoroughly. But of course, version 2.5.4 will continue to be available in the website until 2.5.5 is confirmed to be stable.

In the new version, you will encounter the following new features:
- Trackshots can be taken of a selected region of a track and not just on the whole track. This is very useful when you're making a tutorial or an analysis about a track or race and you need to include images.
- A new file format (split text) has been added and can be set as the default format in the Settings menu if you wish. I still recommend the one-file format for almost all applications, if you ask me, though, but if you'll use split, I recommend this new split text instead of the split binary, which is kept for compatibility (and because it was easy to keep :P)
- UTF-8 support. You will notice this when you edit the meta-data or change the default track author name. In GNU/Linux and Windows, you should be able to enter non-ASCII characters now. In all file formats, these characters will be saved as UTF-8. If you use split-text mode, you'll be able to edit the meta-data file with any editor that supports UTF-8. Same way, Bliss will read UTF-8 from the meta-data file, although text will be truncated to the maximum length allowed by the meta-data editor within Bliss. If truncation occurs in the middle of a UTF-8 multi-byte codepoint, this may result in an unwanted character at the end of the text, but nothing serious. You'll also see that the window title will take the changes as you modify the track title. If your system uses a codepage other than Latin-1, this may look weird. I'd like to know about that in detail if it happens. Some characters will be displayed without diacritics, but still be saved with the proper UTF-8 codepoint. Please verify this externally.

Any bugs or suggestions (especially about UTF-8, which is pretty complex and surely has more to be added) will be highly welcome. I expect bugs mostly in the Windows platform and for users of countries that don't use Latin-1. It might also happen that Windows is using UTF-8 for the window title... So if anything looks wrong, please tell me which country you have configured your system for, which OS and OS version you have and which wrong characters you see (a screenshot would be good).

Thank you guys so much
Earth is my country. Science is my religion.

afullo

On Ubuntu 16.04 64-bit, I just tried to give a track a title containing all the Italian accents, i.e. vowels with diacritical marks (à, è, é, ì, ò, ù). It seems all of them are displayed correctly, as the attached screenshot shows.

Cas

Thank you, pal. I found a bug now that I really didn't expect and I think it had to do with some flickering issues when typing special characters, so the final version will not do that. I also added extended character support for DOS, because in the version I passed, DOS could only type ASCII (although it could see Unicode). I also solved a little problem with Hungarian letters ? and ?. They were not being render on the window title because they were not part of Latin-1, but I realised that they do have code points in Latin-2 that I can simply "de-injectivise", so now these letters should be visible as such on the window title for those using a Hungarian configuration. Only downside is that those using Latin-1 will, instead of a question mark, see an O with tilde or a U with circumflex respectively. Not a big deal. That actually looks to me as an improvement even for our configuration.

You use GNU/Linux, like I do, and Latin-1 is also good for Italian, so our systems are not so different. Let's see what happens with Windows computers and with other countries configurations (especially Hungarian), but I am pretty certain it'll work well. I just need confirmation from the guys.

Anybody could test Bliss UTF-8 support on a Windows machine?  And on one configured in Hungarian?  Please. Thanks, guys!
Earth is my country. Science is my religion.

afullo

#132
Quote from: Cas on June 10, 2020, 02:54:51 AMThank you, pal.

You are welcome!  ;)

Quote from: Cas on June 10, 2020, 02:54:51 AMAnybody could test Bliss UTF-8 support on a Windows machine?

Bad news: I just tried with Windows 10 Pro 64-bit, but I am unable to insert accents at all via dedicated keys (those between P, L and Enter).  :(

Some insertions with Alt do work, albeit without any diacritical mark (e.g., Alt+0200 should return È, instead it returns E), but other ones do not (Alt+0201 should return É, but it returns nothing).

Cas

#133
Uhm.... some things that you describe, I expected, but others, I didn't. Let's see...

This is what Bliss does:
- When you press a key, it's supposed to receive the UTF-8 string for from GNU/Linux and Windows in different ways. For DOS, it will instead receive the (extended) ASCII value. For GNU/Linux and Windows, it will then transform this into UTF-32 that it uses internally for easier editing.
- When displaying inside the window, such as inside the Track Info menu, since Bliss relies on a FreeBasic function to display the characters, it's forced to only support visual for CP437-compatible characters. Thus, letters like È, which are not present in CP437 are expected to show up as approximations (such as E). Characters for which Bliss can't find a good approximation or that are not recognised should display as a filled box (extended ASCII 254).
- When saving the data, Bliss will transform the string back to UTF-8 and store it. Even characters that were not recognised will be stored with the values they came in, so you can verify this by storing in text metadata format and then taking a look at the output with a text editor.
- DOS extended characters are supported now, but not in the version I passed. I tested them with DOSBox and they work.

So about these two things:
- For characters not showing exactly the same, but as an approximation, that should be OK. It's expected. I will later on add something to fix even that. But for now, you can see files should be stored with the right values and the window title will probably support more characters than the program text.
- For keys not responding at all and not even causing a 254 filled box to appear, I'm concerned. It might make sense if these are dead keys (keys that require you to type a second key to compose a character, such as the grave accent key), but otherwise, all character keys should be supported. If you find that a particular key that normally issues a character does not respond, please tell me which so I can analyse the situation.
Earth is my country. Science is my religion.

afullo

Quote from: Cas on June 11, 2020, 09:55:38 PM
- For characters not showing exactly the same, but as an approximation, that should be OK. It's expected. I will later on add something to fix even that. But for now, you can see files should be stored with the right values and the window title will probably support more characters than the program text.

Ok, on Ubuntu I was able to insert À, È, É, Ì, Ò, Ù and display them correctly on the window title, although not on the program text. But, as you said, it is expected; furthermore, in Italian you can use È to start a sentence with the third person present of the verb "to be", but there aren't many chances to use another accented vowel as the first letter unless you have to specify accentuation for diction purposes, since excluding that case only the last letter of a word can bring an accent (È is a particular case, being a single-letter word), so they are of some use only by writing in caps lock; see the attached screenshot.

Quote from: Cas on June 11, 2020, 09:55:38 PM
- For keys not responding at all and not even causing a 254 filled box to appear, I'm concerned. It might make sense if these are dead keys (keys that require you to type a second key to compose a character, such as the grave accent key), but otherwise, all character keys should be supported. If you find that a particular key that normally issues a character does not respond, please tell me which so I can analyse the situation.

On Windows, all the keys that are meant to issue respectively à, è, é, ì, ò, ù do not respond; they are not dead keys, since the character is composed by pressing a single key.