#12 Accents from the Windows's cli

Closed

opened 5 years ago by Zykino · 19 comments

They are baddly interpreted. I think they are "Latin-1" and interpreted as if they were "UTF-8"

Windows-7 in English

See #9 for more infos

Zykino changed title from ~~Accents from the Windiw's cli~~ to Accents from the Windows's cli 5 years ago

We should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8

I guess python should have libraries to do that as this is a pretty common problem!

LecygneNoir added the

bug

label 5 years ago

LecygneNoir added the

Todo

label 5 years ago

Hi, I just tested the encoding problem and created https://git.lecygnenoir.info/Zykino/prismedia/src/branch/feature/encoding

I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't.

The problem I have on this branch is that I cannot validate that the decoded string are... strings.

If I test any of the following it write True on the console but Schema does not validate the options (tested with unicode and the more global basestring) :
isinstance(options["--name"], unicode)
type(options["--name"]) is unicode
options["--name"].isdigit()

(If you could test it on Linux and / or help with the validation I would take any :))

Hello, thanks for the work!

Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad 😂 ) chars?

Thanks!

Sure! The command line I'm using contains only common French accents:
prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"

I'm not sure about the --tag option. I may need to test it too.

👍 1

Okay, I confirm I also have an error on Linux at the moment:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

I'll dig more to see what we can do on it!

I don't have this message...

2018-11-11 12:53:01,186 No suitable NFO found, skipping.
The video description should be a string

(Also I re-pushed a new commit with the error message fixed)

Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation.

Despite the locale is UTF-8 it seems the decode lib try to use ascii :thinkking:

Okay this is another problem here, python cannot encode utf8 to utf8: https://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal

I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)

😆 1

Since I'm using locale.getpreferredencoding() I think you can check if it is utf-8.

Can you send me the output of locale.getpreferredencoding() on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)

@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D

if locale.getpreferredencoding().lower() != "utf-8":
        utils.decodeArgumentStrings(options, locale.getpreferredencoding())

On linux, utf8 is UTF-8 but I think a lower() to avoid special cases does not hurt.

With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...)

Thus I do not know at the moment how to reproduce your bug and help you to patch :'(

The first thing I would try however, if you does not have yet, is to force string when setting the option in the decodeArgumentStrings function, something like:

if options["--name"] is not None:
        options["--name"] = str(options["--name"].decode(encoding))

Does python could handle it on Windows?

It gives me:

Traceback (most recent call last):
  File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia\prismedia_upload.py", line 230, in <module>
    utils.decodeArgumentStrings(options, locale.getpreferredencoding())
  File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia/lib\utils.py", line 214, in decodeArgumentStrings
    options["--name"] = str(options["--name"].decode(encoding))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-25: ordinal not in range(128)

Adding the str gives that ? 😱

I think I found!
If I replace str with unicode (or basestring) it look like it's working.
But I don't find a proper documentation for the And() function of Schema, so I do not know what is necessary or not to use it.

I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles?

Do the last commit works for you?

Hey!

I have tested upload an it works, great job!

However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/

If it could help, the Or and And system in schema works inside the parenthesis.

Eg:

Or(None, And(str,lambda x: not x.isdigit())

means:

None OR (str AND lambda x: not x.isdigit())

So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line).
I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^"

If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-)

For info, I have tested at the commit 384fb82541 with the message prevent decoding unicode strings since python prefer to crash than doing nothing

Is this ok?

Thanks!

Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string)

I just wanted to test a bit more the tags with accents before letting you merge #21

And apparently I did not push everything (I had a bug when the text came from the nfo file instead of the command-line)

Okay! I am (at least) able to test the code on Linux, it work perfectly for me.

I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload.

If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!

Zykino closed this issue 5 years ago

LecygneNoir referenced this issue from a commit 4 years ago

Merge branch 'feature/encoding' of Zykino/prismedia into develop fix #12, thanks Zykino!

bug

Todo

No Milestone

No Assignees

2 Participants

Notifications

Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.