#12 Accents from the Windows's cli

Closed
opened 5 years ago by Zykino · 19 comments
Zykino commented 5 years ago

They are baddly interpreted. I think they are "Latin-1" and interpreted as if they were "UTF-8"

Windows-7 in English

They are baddly interpreted. I think they are "Latin-1" and interpreted as if they were "UTF-8" Windows-7 in English
Zykino commented 5 years ago
Poster

See #9 for more infos

See #9 for more infos
Zykino changed title from Accents from the Windiw's cli to Accents from the Windows's cli 5 years ago
Poster
Owner

We should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8

I guess python should have libraries to do that as this is a pretty common problem!

We should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8 I guess python should have libraries to do that as this is a pretty common problem!
LecygneNoir added the
bug
label 5 years ago
LecygneNoir added the
Todo
label 5 years ago
Zykino commented 5 years ago
Poster

Hi, I just tested the encoding problem and created https://git.lecygnenoir.info/Zykino/prismedia/src/branch/feature/encoding

I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't.

The problem I have on this branch is that I cannot validate that the decoded string are... strings.

If I test any of the following it write True on the console but Schema does not validate the options (tested with unicode and the more global basestring) :
isinstance(options["--name"], unicode)
type(options["--name"]) is unicode
options["--name"].isdigit()

Hi, I just tested the encoding problem and created https://git.lecygnenoir.info/Zykino/prismedia/src/branch/feature/encoding I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't. The problem I have on this branch is that I cannot validate that the decoded string are... strings. If I test any of the following it write `True` on the console but `Schema` does not validate the options (tested with `unicode` and the more global `basestring`) : `isinstance(options["--name"], unicode)` `type(options["--name"]) is unicode` `options["--name"].isdigit()`
Zykino commented 5 years ago
Poster

(If you could test it on Linux and / or help with the validation I would take any :))

(If you could test it on Linux and / or help with the validation I would take any :))
Poster
Owner

Hello, thanks for the work!

Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad 😂 ) chars?

Thanks!

Hello, thanks for the work! Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad :joy: ) chars? Thanks!
Zykino commented 5 years ago
Poster

Sure! The command line I'm using contains only common French accents:
prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"

I'm not sure about the --tag option. I may need to test it too.

Sure! The command line I'm using contains only common French accents: `prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"` I'm not sure about the `--tag` option. I may need to test it too.
Poster
Owner

Okay, I confirm I also have an error on Linux at the moment:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

I'll dig more to see what we can do on it!

Okay, I confirm I also have an error on Linux at the moment: ``` UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128) ``` I'll dig more to see what we can do on it!
Zykino commented 5 years ago
Poster

I don't have this message...

2018-11-11 12:53:01,186 No suitable NFO found, skipping.
The video description should be a string

(Also I re-pushed a new commit with the error message fixed)

I don't have this message... ``` 2018-11-11 12:53:01,186 No suitable NFO found, skipping. The video description should be a string ``` (Also I re-pushed a new commit with the error message fixed)
Poster
Owner

Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation.

Despite the locale is UTF-8 it seems the decode lib try to use ascii :thinkking:

Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation. Despite the locale is `UTF-8` it seems the decode lib try to use ascii :thinkking:
Poster
Owner

Okay this is another problem here, python cannot encode utf8 to utf8: https://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal

I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)

Okay this is another problem here, python cannot encode utf8 to utf8: https://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)
Zykino commented 5 years ago
Poster

Since I'm using locale.getpreferredencoding() I think you can check if it is utf-8.

Can you send me the output of locale.getpreferredencoding() on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)

Since I'm using `locale.getpreferredencoding()` I think you can check if it is utf-8. Can you send me the output of `locale.getpreferredencoding()` on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)
Poster
Owner

@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D

if locale.getpreferredencoding().lower() != "utf-8":
        utils.decodeArgumentStrings(options, locale.getpreferredencoding())

On linux, utf8 is UTF-8 but I think a lower() to avoid special cases does not hurt.

With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...)

Thus I do not know at the moment how to reproduce your bug and help you to patch :'(

The first thing I would try however, if you does not have yet, is to force string when setting the option in the decodeArgumentStrings function, something like:

if options["--name"] is not None:
        options["--name"] = str(options["--name"].decode(encoding))

Does python could handle it on Windows?

@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D ``` if locale.getpreferredencoding().lower() != "utf-8": utils.decodeArgumentStrings(options, locale.getpreferredencoding()) ``` On linux, utf8 is `UTF-8` but I think a lower() to avoid special cases does not hurt. With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...) Thus I do not know at the moment how to reproduce your bug and help you to patch :'( The first thing I would try however, if you does not have yet, is to force string when setting the option in the `decodeArgumentStrings` function, something like: ``` if options["--name"] is not None: options["--name"] = str(options["--name"].decode(encoding)) ``` Does python could handle it on Windows?
Zykino commented 5 years ago
Poster

It gives me:

Traceback (most recent call last):
  File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia\prismedia_upload.py", line 230, in <module>
    utils.decodeArgumentStrings(options, locale.getpreferredencoding())
  File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia/lib\utils.py", line 214, in decodeArgumentStrings
    options["--name"] = str(options["--name"].decode(encoding))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-25: ordinal not in range(128)
It gives me: ``` Traceback (most recent call last): File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia\prismedia_upload.py", line 230, in <module> utils.decodeArgumentStrings(options, locale.getpreferredencoding()) File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia/lib\utils.py", line 214, in decodeArgumentStrings options["--name"] = str(options["--name"].decode(encoding)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-25: ordinal not in range(128) ```
Poster
Owner

Adding the str gives that ? 😱

Adding the str gives that ? :scream:
Zykino commented 5 years ago
Poster

I think I found!
If I replace str with unicode (or basestring) it look like it's working.
But I don't find a proper documentation for the And() function of Schema, so I do not know what is necessary or not to use it.

I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles?

Do the last commit works for you?

I think I found! If I replace `str` with `unicode` (or `basestring`) it look like it's working. But I don't find a proper documentation for the `And()` function of Schema, so I do not know what is necessary or not to use it. I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles? Do the last commit works for you?
Poster
Owner

Hey!

I have tested upload an it works, great job!

However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/

If it could help, the Or and And system in schema works inside the parenthesis.

Eg:

Or(None, And(str,lambda x: not x.isdigit())

means:

None OR (str AND lambda x: not x.isdigit())

So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line).
I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^"

If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-)

For info, I have tested at the commit 384fb82541 with the message prevent decoding unicode strings since python prefer to crash than doing nothing

Is this ok?

Thanks!

Hey! I have tested upload an it works, great job! However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/ If it could help, the Or and And system in schema works inside the parenthesis. Eg: ``` Or(None, And(str,lambda x: not x.isdigit()) ``` means: ``` None OR (str AND lambda x: not x.isdigit()) ``` So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line). I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^" If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-) For info, I have tested at the commit 384fb8254113b87159ea8fe8808329df7ee04e84 with the message `prevent decoding unicode strings since python prefer to crash than doing nothing` Is this ok? Thanks!
Zykino commented 5 years ago
Poster

Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string)

I just wanted to test a bit more the tags with accents before letting you merge #21

Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string) I just wanted to test a bit more the tags with accents before letting you merge #21
Zykino commented 5 years ago
Poster

And apparently I did not push everything (I had a bug when the text came from the nfo file instead of the command-line)

And apparently I did not push everything (I had a bug when the text came from the nfo file instead of the command-line)
Poster
Owner

Okay! I am (at least) able to test the code on Linux, it work perfectly for me.

I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload.

If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!

Okay! I am (at least) able to test the code on Linux, it work perfectly for me. I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload. If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!
Zykino closed this issue 5 years ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.