We should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8
I guess python should have libraries to do that as this is a pretty common problem!
We should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8
I guess python should have libraries to do that as this is a pretty common problem!
I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't.
The problem I have on this branch is that I cannot validate that the decoded string are... strings.
If I test any of the following it write True on the console but Schema does not validate the options (tested with unicode and the more global basestring) : isinstance(options["--name"], unicode) type(options["--name"]) is unicode options["--name"].isdigit()
Hi, I just tested the encoding problem and created https://git.lecygnenoir.info/Zykino/prismedia/src/branch/feature/encoding
I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't.
The problem I have on this branch is that I cannot validate that the decoded string are... strings.
If I test any of the following it write `True` on the console but `Schema` does not validate the options (tested with `unicode` and the more global `basestring`) :
`isinstance(options["--name"], unicode)`
`type(options["--name"]) is unicode`
`options["--name"].isdigit()`
Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad 😂 ) chars?
Thanks!
Hello, thanks for the work!
Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad :joy: ) chars?
Thanks!
Sure! The command line I'm using contains only common French accents: prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"
I'm not sure about the --tag option. I may need to test it too.
Sure! The command line I'm using contains only common French accents:
`prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"`
I'm not sure about the `--tag` option. I may need to test it too.
Okay, I confirm I also have an error on Linux at the moment:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
I'll dig more to see what we can do on it!
Okay, I confirm I also have an error on Linux at the moment:
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)
```
I'll dig more to see what we can do on it!
2018-11-11 12:53:01,186 No suitable NFO found, skipping.
The video description should be a string
(Also I re-pushed a new commit with the error message fixed)
I don't have this message...
```
2018-11-11 12:53:01,186 No suitable NFO found, skipping.
The video description should be a string
```
(Also I re-pushed a new commit with the error message fixed)
Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation.
Despite the locale is UTF-8 it seems the decode lib try to use ascii :thinkking:
Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation.
Despite the locale is `UTF-8` it seems the decode lib try to use ascii :thinkking:
I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)
Okay this is another problem here, python cannot encode utf8 to utf8: https://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal
I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)
Since I'm using locale.getpreferredencoding() I think you can check if it is utf-8.
Can you send me the output of locale.getpreferredencoding() on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)
Since I'm using `locale.getpreferredencoding()` I think you can check if it is utf-8.
Can you send me the output of `locale.getpreferredencoding()` on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)
@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D
if locale.getpreferredencoding().lower() != "utf-8":
utils.decodeArgumentStrings(options, locale.getpreferredencoding())
On linux, utf8 is UTF-8 but I think a lower() to avoid special cases does not hurt.
With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...)
Thus I do not know at the moment how to reproduce your bug and help you to patch :'(
The first thing I would try however, if you does not have yet, is to force string when setting the option in the decodeArgumentStrings function, something like:
if options["--name"] is not None:
options["--name"] = str(options["--name"].decode(encoding))
Does python could handle it on Windows?
@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D
```
if locale.getpreferredencoding().lower() != "utf-8":
utils.decodeArgumentStrings(options, locale.getpreferredencoding())
```
On linux, utf8 is `UTF-8` but I think a lower() to avoid special cases does not hurt.
With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...)
Thus I do not know at the moment how to reproduce your bug and help you to patch :'(
The first thing I would try however, if you does not have yet, is to force string when setting the option in the `decodeArgumentStrings` function, something like:
```
if options["--name"] is not None:
options["--name"] = str(options["--name"].decode(encoding))
```
Does python could handle it on Windows?
Traceback (most recent call last):
File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia\prismedia_upload.py", line 230, in <module>
utils.decodeArgumentStrings(options, locale.getpreferredencoding())
File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia/lib\utils.py", line 214, in decodeArgumentStrings
options["--name"] = str(options["--name"].decode(encoding))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-25: ordinal not in range(128)
It gives me:
```
Traceback (most recent call last):
File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia\prismedia_upload.py", line 230, in <module>
utils.decodeArgumentStrings(options, locale.getpreferredencoding())
File "C:\Users\Julien\Documents\0DocsPerso\Code\Prismedia/lib\utils.py", line 214, in decodeArgumentStrings
options["--name"] = str(options["--name"].decode(encoding))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-25: ordinal not in range(128)
```
I think I found!
If I replace str with unicode (or basestring) it look like it's working.
But I don't find a proper documentation for the And() function of Schema, so I do not know what is necessary or not to use it.
I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles?
Do the last commit works for you?
I think I found!
If I replace `str` with `unicode` (or `basestring`) it look like it's working.
But I don't find a proper documentation for the `And()` function of Schema, so I do not know what is necessary or not to use it.
I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles?
Do the last commit works for you?
However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/
If it could help, the Or and And system in schema works inside the parenthesis.
Eg:
Or(None, And(str,lambda x: not x.isdigit())
means:
None OR (str AND lambda x: not x.isdigit())
So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line).
I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^"
If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-)
For info, I have tested at the commit 384fb82541 with the message prevent decoding unicode strings since python prefer to crash than doing nothing
Is this ok?
Thanks!
Hey!
I have tested upload an it works, great job!
However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/
If it could help, the Or and And system in schema works inside the parenthesis.
Eg:
```
Or(None, And(str,lambda x: not x.isdigit())
```
means:
```
None OR (str AND lambda x: not x.isdigit())
```
So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line).
I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^"
If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-)
For info, I have tested at the commit 384fb8254113b87159ea8fe8808329df7ee04e84 with the message `prevent decoding unicode strings since python prefer to crash than doing nothing`
Is this ok?
Thanks!
Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string)
I just wanted to test a bit more the tags with accents before letting you merge #21
Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string)
I just wanted to test a bit more the tags with accents before letting you merge #21
Okay! I am (at least) able to test the code on Linux, it work perfectly for me.
I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload.
If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!
Okay! I am (at least) able to test the code on Linux, it work perfectly for me.
I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload.
If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!
They are baddly interpreted. I think they are "Latin-1" and interpreted as if they were "UTF-8"
Windows-7 in English
See #9 for more infos
Accents from the Windiw's clito Accents from the Windows's cli 5 years agoWe should see if this is possible to get the encoding and adapt the code to deals with it, or to convert it to utf-8
I guess python should have libraries to do that as this is a pretty common problem!
Hi, I just tested the encoding problem and created https://git.lecygnenoir.info/Zykino/prismedia/src/branch/feature/encoding
I think this would make the encoding work : When I print to the console I see the accent while without the changes I don't.
The problem I have on this branch is that I cannot validate that the decoded string are... strings.
If I test any of the following it write
True
on the console butSchema
does not validate the options (tested withunicode
and the more globalbasestring
) :isinstance(options["--name"], unicode)
type(options["--name"]) is unicode
options["--name"].isdigit()
(If you could test it on Linux and / or help with the validation I would take any :))
Hello, thanks for the work!
Could you please give me an example of title/description/everything that create the problème to be sure I test with good (ar seems bad 😂 ) chars?
Thanks!
Sure! The command line I'm using contains only common French accents:
prismedia_upload.py --file="XXX" --name="5 - Test avec accents éèàç" --description="Test avec accents éèàç"
I'm not sure about the
--tag
option. I may need to test it too.Okay, I confirm I also have an error on Linux at the moment:
I'll dig more to see what we can do on it!
I don't have this message...
(Also I re-pushed a new commit with the error message fixed)
Yes for me it crashes directly in decodeArgumentStrings function, even before the schema validation.
Despite the locale is
UTF-8
it seems the decode lib try to use ascii :thinkking:Okay this is another problem here, python cannot encode utf8 to utf8: https://stackoverflow.com/questions/24475393/unicodedecodeerror-ascii-codec-cant-decode-byte-0xc3-in-position-23-ordinal
I'll need to deals with detecting if this is already utf8, then choose to encode (or not) ;-)
Since I'm using
locale.getpreferredencoding()
I think you can check if it is utf-8.Can you send me the output of
locale.getpreferredencoding()
on linux ? (I think it's "utf_8" but not sure about caps and the underscore :s)@Zykino yep this was exactly what I have done, a check for UTF-8 before tryning to decode :D
On linux, utf8 is
UTF-8
but I think a lower() to avoid special cases does not hurt.With this patch, I am able to upload correctly (normal as the decode function is not called, the original string stay...)
Thus I do not know at the moment how to reproduce your bug and help you to patch :'(
The first thing I would try however, if you does not have yet, is to force string when setting the option in the
decodeArgumentStrings
function, something like:Does python could handle it on Windows?
It gives me:
Adding the str gives that ? 😱
I think I found!
If I replace
str
withunicode
(orbasestring
) it look like it's working.But I don't find a proper documentation for the
And()
function of Schema, so I do not know what is necessary or not to use it.I don't know why we need the lambda testing if the string is not a digit. Is it necessary for Schema? Aren't full numerical string valid titles?
Do the last commit works for you?
Hey!
I have tested upload an it works, great job!
However in my version of the encoding branch freshly pulled, I don't see any unicode or basestring in the code, so I am unsure I have tested the good part :-/
If it could help, the Or and And system in schema works inside the parenthesis.
Eg:
means:
So the parameters is optional (the None part) and if provided, it should be a string (provided by " " in the command line).
I am honestly not sure the lambda part is really needed as the str should do the job (check the " " in the parameters) but the schema documentation used that to check string, so I use it at it is to avoid further crash ^^"
If you can confirm the current encoding branch is working for you on Windows, I think we can merge it as it also work on Linux :-)
For info, I have tested at the commit
384fb82541
with the messageprevent decoding unicode strings since python prefer to crash than doing nothing
Is this ok?
Thanks!
Yes but... since you have an UTF-8 command line you verified that my check excluded you from the modification I did (The bug where you cannot decode an UTF-8 string)
I just wanted to test a bit more the tags with accents before letting you merge #21
And apparently I did not push everything (I had a bug when the text came from the nfo file instead of the command-line)
Okay! I am (at least) able to test the code on Linux, it work perfectly for me.
I have tested with and without accent, with a variety of special character, everything works and the video is correclty setted after upload.
If ts' ok for you and all your modification are pushed, I can merge as soon as possible, many thanks for the work!