At a recent SOB meeting we decided to take a look at adapting the WebUtils class to use Json instead of its own slightly idiosyncratic json encode/decode code. It seems daft to have multiple bits of code doing the same thing.
The basics are trivial - change the WebUtils class>>#jsonDecode: & WebUtils class>>#jsonEncode: methods. However, we then have a few unit tests failing, almost all because of what looks like very minor disagreements about how the output of encoding something should be formatted.
For example, does it in any way matter if #(#() #()) ) gets output as '[[], []]' or '[[],[]]'? I suspect not. Similarly, is it crucial whether decoding '{[}' raises Error or a Json specific Error subclass?
A slightly more serious issue is whether decoding {} (that's json dictionaries) should create a Dictionary or a JsonObject. JsonObjects are quite useful because they handle being a dictionary as well as allowing use of keys as access messages.
More complex is the question of the WebClientServerTest>>#testNilTrueFalse & #testNumbers tests. Should 'falsef' be an error or return 'false' etc? And is '-0123' valid json or not? As far as I can tell from trying to look up json syntax details, leading zeros are supposed to be an error and falsef should likewise.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Moves his lips to pretend he's reading.
Hi Tim,
thanks for bringing this up. Below are some opinionated and not necessarily thought-through comments on your questions ...
On 2023-10-23T13:38:52-07:00, tim@rowledge.org wrote:
At a recent SOB meeting we decided to take a look at adapting the WebUtils class to use Json instead of its own slightly idiosyncratic json encode/decode code. It seems daft to have multiple bits of code doing the same thing.
The basics are trivial - change the WebUtils class>>#jsonDecode: & WebUtils class>>#jsonEncode: methods.
What will happen to #jsonArrayFrom: etc.? They are not marked as private, so I wonder whether it is okay to remove them or whether we should at least deprecate them and forward them to #parseAsJson or so.
However, we then have a few unit tests failing, almost all because of what looks like very minor disagreements about how the output of encoding something should be formatted.
For example, does it in any way matter if #(#() #()) ) gets output as '[[], []]' or '[[],[]]'? I suspect not.
I would prefer the latter as it is more compact for transmission of potentially large data via IO/network. However, an option for pretty-printing would also be nice to have.
Similarly, is it crucial whether decoding '{[}' raises Error or a Json specific Error subclass?
Domain-specific exceptions are always preferable IMO.
A slightly more serious issue is whether decoding {} (that's json dictionaries) should create a Dictionary or a JsonObject. JsonObjects are quite useful because they handle being a dictionary as well as allowing use of keys as access messages.
Good question! I heavily rely on the nice message passing trick of JsonObject in several projects when I use the Json class or String>>#parseAsJson. On the other hand, I can also understand if some other projects do not want to use this feature as it for instance might conflict with possibly absent collection selectors, e.g.:
[myDictionary maxKey] onDNU: #maxKey do: [myDictionary keys max]
Even though I admit the questionable quality of the above example. Maybe this behavior could be made an option as well? But I'm probably already overthinking this too much ... Probably JsonObject would also be fine always.
More complex is the question of the WebClientServerTest>>#testNilTrueFalse & #testNumbers tests. Should 'falsef' be an error or return 'false' etc?
Is there actually a difference between both implementations with regard to this? All JSON implementations I checked - v8's JSON API, jq, python3's json package -, agree in raising a syntax error for nullll etc. In addition, I have never been a friend of silent data corruption.
And is '-0123' valid json or not?
Again, all implementations I checked treat this as an error, so I believe we should do too.
As far as I can tell from trying to look up json syntax details, leading zeros are supposed to be an error and falsef should likewise.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Useful random insult:- Moves his lips to pretend he's reading.
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-10-25, at 10:44 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:
What will happen to #jsonArrayFrom: etc.? They are not marked as private, so I wonder whether it is okay to remove them or whether we should at least deprecate them and forward them to #parseAsJson or so.
Always a good question and one of the reasons I wish we had private methods. Initially I'd go with deprecating and making it an error, just for brutal simplicity.
However, we then have a few unit tests failing, almost all because of what looks like very minor disagreements about how the output of encoding something should be formatted.
For example, does it in any way matter if #(#() #()) ) gets output as '[[], []]' or '[[],[]]'? I suspect not.
I would prefer the latter as it is more compact for transmission of potentially large data via IO/network. However, an option for pretty-printing would also be nice to have.
Feel free to extend Shout to pretty-print json :-)
My key issue is what the proper response in the tests should be. Right now they are set to expect what the original WebUtils code produces, as opposed to sometihng based on a json standard. I'm assuming there is in fact a json standard to follow, that makes sense and is consistent etc. Experience tells me that this may not be sometihng that exists...
Similarly, is it crucial whether decoding '{[}' raises Error or a Json specific Error subclass?
Domain-specific exceptions are always preferable IMO.
Yes, I'd agree in general. The more information can be encapsulated to help us when trying to peer into the depths of an error, the better.
A slightly more serious issue is whether decoding {} (that's json dictionaries) should create a Dictionary or a JsonObject. JsonObjects are quite useful because they handle being a dictionary as well as allowing use of keys as access messages.
Good question! I heavily rely on the nice message passing trick of JsonObject in several projects when I use the Json class or String>>#parseAsJson. On the other hand, I can also understand if some other projects do not want to use this feature as it for instance might conflict with possibly absent collection selectors, e.g.:
[myDictionary maxKey] onDNU: #maxKey do: [myDictionary keys max]
Even though I admit the questionable quality of the above example. Maybe this behavior could be made an option as well? But I'm probably already overthinking this too much ... Probably JsonObject would also be fine always.
I too like JsonObject in quite a few cases. I guess the question is whether we ought to be pedantic and produce a Dictionary from some json that is declared as representing a dictionary. JsonObject is a subclass of Dictionary, so tomato-tomato (/təˈmeɪtoʊ təˈmɑːtəʊ/).
More complex is the question of the WebClientServerTest>>#testNilTrueFalse & #testNumbers tests. Should 'falsef' be an error or return 'false' etc?
Is there actually a difference between both implementations with regard to this? All JSON implementations I checked - v8's JSON API, jq, python3's json package -, agree in raising a syntax error for nullll etc. In addition, I have never been a friend of silent data corruption.
If you check the WebClientServerTest #testNilTrueFalse method you'll see what it is checking for. The Json class reads 'nulll' as nil 'nulll' parseAsJon and 'falsef' as false, and 'truefalse' as true, which I'm fairly sure are strictly errors
And is '-0123' valid json or not?
Again, all implementations I checked treat this as an error, so I believe we should do too.
As far as I can tell from trying to look up json syntax details, leading zeros are supposed to be an error and falsef should likewise.
This is a case where the WebUtils code appears to be incorrect.
So I posit that making the change is a good idea, a couple of minor fixes are needed, and the test class needs a few changes.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: KFP: Kindle Fire in Printer
Hi Tim,
On 2023-10-25T12:30:03-07:00, tim@rowledge.org wrote:
On 2023-10-25, at 10:44 AM, <christoph.thiede(a)student.hpi.uni-potsdam.de> <christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:
What will happen to #jsonArrayFrom: etc.? They are not marked as private, so I wonder whether it is okay to remove them or whether we should at least deprecate them and forward them to #parseAsJson or so.
Always a good question and one of the reasons I wish we had private methods.
But we do have private categories and pvt selectors. :-)
Initially I'd go with deprecating and making it an error, just for brutal simplicity.
How would this be better than just deleting them?
However, we then have a few unit tests failing, almost all because of what looks like very minor disagreements about how the output of encoding something should be formatted.
For example, does it in any way matter if #(#() #()) ) gets output as '[[], []]' or '[[],[]]'? I suspect not.
I would prefer the latter as it is more compact for transmission of potentially large data via IO/network. However, an option for pretty-printing would also be nice to have.
Feel free to extend Shout to pretty-print json :-)
My key issue is what the proper response in the tests should be. Right now they are set to expect what the original WebUtils code produces, as opposed to sometihng based on a json standard. I'm assuming there is in fact a json standard to follow, that makes sense and is consistent etc. Experience tells me that this may not be sometihng that exists...
You mean a standardized coding style? Probably not. Python leaves some spaces, JavaScript avoids any spaces, jq uses multiple lines. Again, I would prioritize brevity by default for the sake of transmissions. We have got much nicer tools for exploring the actual structure of the underlying data within Squeak.
Similarly, is it crucial whether decoding '{[}' raises Error or a Json specific Error subclass?
Domain-specific exceptions are always preferable IMO.
Yes, I'd agree in general. The more information can be encapsulated to help us when trying to peer into the depths of an error, the better.
A slightly more serious issue is whether decoding {} (that's json dictionaries) should create a Dictionary or a JsonObject. JsonObjects are quite useful because they handle being a dictionary as well as allowing use of keys as access messages.
Good question! I heavily rely on the nice message passing trick of JsonObject in several projects when I use the Json class or String>>#parseAsJson. On the other hand, I can also understand if some other projects do not want to use this feature as it for instance might conflict with possibly absent collection selectors, e.g.:
[myDictionary maxKey] onDNU: #maxKey do: [myDictionary keys max]
Even though I admit the questionable quality of the above example. Maybe this behavior could be made an option as well? But I'm probably already overthinking this too much ... Probably JsonObject would also be fine always.
I too like JsonObject in quite a few cases. I guess the question is whether we ought to be pedantic and produce a Dictionary from some json that is declared as representing a dictionary. JsonObject is a subclass of Dictionary, so tomato-tomato (/təˈmeɪtoʊ təˈmɑːtəʊ/).
A JsonObject is a dictionary, or what do you mean here? :-)
More complex is the question of the WebClientServerTest>>#testNilTrueFalse & #testNumbers tests. Should 'falsef' be an error or return 'false' etc?
Is there actually a difference between both implementations with regard to this? All JSON implementations I checked - v8's JSON API, jq, python3's json package -, agree in raising a syntax error for nullll etc. In addition, I have never been a friend of silent data corruption.
If you check the WebClientServerTest #testNilTrueFalse method you'll see what it is checking for. The Json class reads 'nulll' as nil 'nulll' parseAsJon and 'falsef' as false, and 'truefalse' as true, which I'm fairly sure are strictly errors
Oh, I forgot that you can pass non-dictionaries to the decoding protocols. Still, I would opt for errors in this case, just like python, jq, and JS do. So, I don't like that 'falsef' parseAsJson answers false. I know '42a' asNumber works as well, but I also don't like that ... Maybe there should be both #readNonExhaustiveFromStream: and #readExhaustiveFromString: to support composition of parsers for rarer scenarios, where the exhaustive version would assert aStream atEnd at the end (with better names of these messages, of course) ...
And is '-0123' valid json or not?
Again, all implementations I checked treat this as an error, so I believe we should do too.
As far as I can tell from trying to look up json syntax details, leading zeros are supposed to be an error and falsef should likewise.
This is a case where the WebUtils code appears to be incorrect.
So I posit that making the change is a good idea, a couple of minor fixes are needed, and the test class needs a few changes.
+1
BTW: Has anyone compared the performance of both implementations?
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Strange OpCodes: KFP: Kindle Fire in Printer
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-10-26, at 6:11 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:
Always a good question and one of the reasons I wish we had private methods.
But we do have private categories and pvt selectors. :-)
Smiley highly appropriate. They're about as private as a toilet cubicle with no door. Or walls.
Initially I'd go with deprecating and making it an error, just for brutal simplicity.
How would this be better than just deleting them?
Well, it would allow people to see what went before? At least until That Glorious Day when we do a release and completely remove them.
My key issue is what the proper response in the tests should be. Right now they are set to expect what the original WebUtils code produces, as opposed to sometihng based on a json standard. I'm assuming there is in fact a json standard to follow, that makes sense and is consistent etc. Experience tells me that this may not be sometihng that exists...
You mean a standardized coding style? Probably not. Python leaves some spaces, JavaScript avoids any spaces, jq uses multiple lines. Again, I would prioritize brevity by default for the sake of transmissions. We have got much nicer tools for exploring the actual structure of the underlying data within Squeak.
No, not a style, but whatever any json standard asks for. Remember the unit tests are rather literally minded; write it to expect a smiley and you'd better not return a frowney.
The space-or-not can make a test that really passes, appear to fail. We need to be more careful.
A slightly more serious issue is whether decoding {} (that's json dictionaries) should create a Dictionary or a JsonObject. JsonObjects are quite useful because they handle being a dictionary as well as allowing use of keys as access messages.
Good question! I heavily rely on the nice message passing trick of JsonObject in several projects when I use the Json class or String>>#parseAsJson. On the other hand, I can also understand if some other projects do not want to use this feature as it for instance might conflict with possibly absent collection selectors, e.g.:
[myDictionary maxKey] onDNU: #maxKey do: [myDictionary keys max]
Even though I admit the questionable quality of the above example. Maybe this behavior could be made an option as well? But I'm probably already overthinking this too much ... Probably JsonObject would also be fine always.
I too like JsonObject in quite a few cases. I guess the question is whether we ought to be pedantic and produce a Dictionary from some json that is declared as representing a dictionary. JsonObject is a subclass of Dictionary, so tomato-tomato (/təˈmeɪtoʊ təˈmɑːtəʊ/).
A JsonObject is a dictionary, or what do you mean here? :-)
A JsonObject is a subclass of Dictionary, but the test code explicitly requires a Dictionary. If we like the JsonObject, it requires changing the test code - which isn't a problem unless someone has a good reason it really, really, should be testing for Dictionary.
More complex is the question of the WebClientServerTest>>#testNilTrueFalse & #testNumbers tests. Should 'falsef' be an error or return 'false' etc?
Is there actually a difference between both implementations with regard to this? All JSON implementations I checked - v8's JSON API, jq, python3's json package -, agree in raising a syntax error for nullll etc. In addition, I have never been a friend of silent data corruption.
If you check the WebClientServerTest #testNilTrueFalse method you'll see what it is checking for. The Json class reads 'nulll' as nil 'nulll' parseAsJon and 'falsef' as false, and 'truefalse' as true, which I'm fairly sure are strictly errors
More simple to fix issues relying on what the standard expects; but that test code is fussy...
So I posit that making the change is a good idea, a couple of minor fixes are needed, and the test class needs a few changes.
+1
BTW: Has anyone compared the performance of both implementations?
I'm fairly sure in the past I have read assurances that the Json classes are faster and more resilient etc. I'm mostly concerned about a little tidy-up to reduce wastage and potential confusion.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Fractured Idiom:- POSH MORTEM - Death styles of the rich and famous
Happily it turns out there is an actual json.org and the site has actual, clear, EBNF diagrams covering this. Yay!
Thus we can see that
'nul', 'nulll', 'truefalse', and 'falsef' should all be errors. The problem seems to be in the #consume: method; checking for not-alphanumeric/nil at the end looks like a plausible solution.
The EBNF for numbers is a bit... convoluted... but I'm pretty sure it means you're not allowed a leading 0 unless is is immediately followed by a $. as part of a floating point number less than 1/ greater than -1.
Also, octal/hex is not allowed for some reason, and right now the JsonNumberParser is not catching that. We would need to enhance the parsing somewhat; the WebUtils code appears to catch the issue almost by accident. I *think* we would want to catch non-exponent letter alphanumerics in a #readExponent method.
I can't see any handling of > 127 value characters anywhere. String writing deals with some escape chars like backspace but not with any wide character. I propose moving the current implementation of #jsonWriteOn: from String down to ByteString (because it has clearly had some care for performance applied), and providing a more general equivalent in String that would pass the buck to Character. That does, at least, pass the current tests.
At the end of my current attempts we have just two items needing improvement a) the 'xyz' dictionary test fails because we can't rely on the ordering of items in a dictionary and a better pass/fail check is needed b) the above mentioned 0x number stuff
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim After a number of decimal places, nobody gives a damn.
You may also want to look at the RFC 8259 for a specification of JSON. I didn't check whether it has the answers that you need(ed) though. https://datatracker.ietf.org/doc/html/rfc8259
Am So., 29. Okt. 2023 um 02:47 Uhr schrieb Tim Rowledge tim@rowledge.org:
Happily it turns out there is an actual json.org and the site has actual, clear, EBNF diagrams covering this. Yay!
Thus we can see that
'nul', 'nulll', 'truefalse', and 'falsef' should all be errors. The problem seems to be in the #consume: method; checking for not-alphanumeric/nil at the end looks like a plausible solution.
The EBNF for numbers is a bit... convoluted... but I'm pretty sure it means you're not allowed a leading 0 unless is is immediately followed by a $. as part of a floating point number less than 1/ greater than -1.
Also, octal/hex is not allowed for some reason, and right now the JsonNumberParser is not catching that. We would need to enhance the parsing somewhat; the WebUtils code appears to catch the issue almost by accident. I *think* we would want to catch non-exponent letter alphanumerics in a #readExponent method.
I can't see any handling of > 127 value characters anywhere. String writing deals with some escape chars like backspace but not with any wide character. I propose moving the current implementation of #jsonWriteOn: from String down to ByteString (because it has clearly had some care for performance applied), and providing a more general equivalent in String that would pass the buck to Character. That does, at least, pass the current tests.
At the end of my current attempts we have just two items needing improvement a) the 'xyz' dictionary test fails because we can't rely on the ordering of items in a dictionary and a better pass/fail check is needed b) the above mentioned 0x number stuff
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim After a number of decimal places, nobody gives a damn.
Hi Tim,
On 2023. 10. 29. 2:47, Tim Rowledge wrote:
Happily it turns out there is an actual json.org and the site has actual, clear, EBNF diagrams covering this. Yay!
Thus we can see that
'nul', 'nulll', 'truefalse', and 'falsef' should all be errors. The problem seems to be in the #consume: method; checking for not-alphanumeric/nil at the end looks like a plausible solution.
Actually, the parser, the Json class, does the correct thing. It is designed to process a stream, and #readFrom: reads a single JSON object from the input stream. If there are leftover bytes, that's out of the scope of the parser.
What can be improved here is to fix the convenience methods in String (#parseAsJson, #parseAsOrderedJson) to check for the end of the stream once data has been parsed. With the exception of 'nul', that change would cover all the above cases, assuming that you're using #parseAsJson. Parsing 'nul' already raises a JsonIncompleteError.
The EBNF for numbers is a bit... convoluted... but I'm pretty sure it means you're not allowed a leading 0 unless is is immediately followed by a $. as part of a floating point number less than 1/ greater than -1.
Also, octal/hex is not allowed for some reason, and right now the JsonNumberParser is not catching that. We would need to enhance the parsing somewhat; the WebUtils code appears to catch the issue almost by accident. I *think* we would want to catch non-exponent letter alphanumerics in a #readExponent method.
Well, I didn't want to implement Float parsing, because it's hard to get right, so I left the heavy lifting to Nicolas's NumberParser. In #testNumbers the second test for '-0123' is incorrect, leading zeroes are not allowed, as the error message says. '-0.0e0' is not parsed as negative zero, but zero. And any other exponent would break that assumption, but it's clearly a marginal issue, as no sane person would write negative zero with an exponent. '1.', '1.e+2', '0x123' and '-.e' are all invalid and raise an error (provided that you check for the end of the stream in case of 0x123).
I can't see any handling of > 127 value characters anywhere. String writing deals with some escape chars like backspace but not with any wide character. I propose moving the current implementation of #jsonWriteOn: from String down to ByteString (because it has clearly had some care for performance applied), and providing a more general equivalent in String that would pass the buck to Character. That does, at least, pass the current tests.
There's no need to handle non-ascii characters in any special way. The only requirement here is that when you transfer JSON from one system to another, it must be done as an UTF-8 encoded string. #asJsonString does not change the character encoding of the strings (it leaves them in Squeak's native encoding), because it does not know whether you'll transfer it to somewhere else or not. Escaping non-ascii characters is cumbersome, unnecessary, and slow. People who implement that only do that to avoid the utf-8 encoding, but it's not worth the effort.
It's also way faster to encode the final string once than to encode every string in a json object before serializing, or to encode every non-ascii character upon creation of a json string.
So, the expectation of WebClientServerTest >> #testStrings that non-ascii characters have to be encoded as escape sequences is implementation specific.
At the end of my current attempts we have just two items needing improvement a) the 'xyz' dictionary test fails because we can't rely on the ordering of items in a dictionary and a better pass/fail check is needed b) the above mentioned 0x number stuff
There's also a test suite that I had a look at when I made changes to the code: https://github.com/nst/JSONTestSuite
Best, Levente
P.S.: Also, whitespaces in the encoded json are optional, and it's better to not have them there. As Christoph mentioned, it would be nice to have a pretty printer method.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim After a number of decimal places, nobody gives a damn.
On 2023-10-29, at 11:24 AM, leves leves@caesar.elte.hu wrote:
Hi Tim,
On 2023. 10. 29. 2:47, Tim Rowledge wrote:
Happily it turns out there is an actual json.org and the site has actual, clear, EBNF diagrams covering this. Yay!
Actually, that EBNF diagram has its confusing points - note the 'value' diagram - says whitespace + the value (ie null) + whitespace. Nice and simple. Except the diagram for whitespace has that line straight across, so easy to miss. So evidently whitespace can be nothing at all, which certainly makes for extra parsing pleasure. I guess it is to allow [true] as opposed to requiring [ true ] and so forth.
Thus we can see that 'nul', 'nulll', 'truefalse', and 'falsef' should all be errors. The problem seems to be in the #consume: method; checking for not-alphanumeric/nil at the end looks like a plausible solution.
Actually, the parser, the Json class, does the correct thing. It is designed to process a stream, and #readFrom: reads a single JSON object from the input stream. If there are leftover bytes, that's out of the scope of the parser.
Surely truefalse, for example, is the very essence of a syntax error? In a json object/dictionary we are required to follow the value with either a $, and the next pair, or a $} to end the object. In an array we are required to follow a value with either a $, or $]. At least, that's what the diagrams say to me but I am in no way a parsing expert; am I wrong?
What can be improved here is to fix the convenience methods in String (#parseAsJson, #parseAsOrderedJson) to check for the end of the stream once data has been parsed. With the exception of 'nul', that change would cover all the above cases, assuming that you're using #parseAsJson.
Don't we need to check that for every (for want of a better word) atom in the input?
Perhaps this is only a problem with the degenerate case of a single atom not in an array or dictionary; certainly trying '[truefalse]' parseAsJson fails correctly, so yeah, sure, altering parseAsJson might solve it for many cases. It won't help for anyone getting a stream and directly using Json class>>#readFrom: though.
[snip]
Well, I didn't want to implement Float parsing, because it's hard to get right, so I left the heavy lifting to Nicolas's NumberParser. In #testNumbers the second test for '-0123' is incorrect, leading zeroes are not allowed, as the error message says. '-0.0e0' is not parsed as negative zero, but zero. And any other exponent would break that assumption, but it's clearly a marginal issue, as no sane person would write negative zero with an exponent. '1.', '1.e+2', '0x123' and '-.e' are all invalid and raise an error (provided that you check for the end of the stream in case of 0x123).
Very sensible ;-) I'm certainly not interested in doing anything except use that code either.
I can't see any handling of > 127 value characters anywhere. String writing deals with some escape chars like backspace but not with any wide character. I propose moving the current implementation of #jsonWriteOn: from String down to ByteString (because it has clearly had some care for performance applied), and providing a more general equivalent in String that would pass the buck to Character. That does, at least, pass the current tests.
There's no need to handle non-ascii characters in any special way. The only requirement here is that when you transfer JSON from one system to another, it must be done as an UTF-8 encoded string. #asJsonString does not change the character encoding of the strings (it leaves them in Squeak's native encoding), because it does not know whether you'll transfer it to somewhere else or not. Escaping non-ascii characters is cumbersome, unnecessary, and slow. People who implement that only do that to avoid the utf-8 encoding, but it's not worth the effort.
It's also way faster to encode the final string once than to encode every string in a json object before serializing, or to encode every non-ascii character upon creation of a json string.
So, the expectation of WebClientServerTest >> #testStrings that non-ascii characters have to be encoded as escape sequences is implementation specific.
Fair point. The standard has got nasty weasel-words about utf-8 if you are exchanging data with systems not part of your closed ecosystem. Another place where somebody demanded special options and it got included in a naff way. "oh, we use JSON(*) *except it isn't any sort of json you'd ever see anywhere else"
Sigh.
So we should take care to make the json string and then utf-8 convert it when sending to outside customers, right?
At the end of my current attempts we have just two items needing improvement a) the 'xyz' dictionary test fails because we can't rely on the ordering of items in a dictionary and a better pass/fail check is needed b) the above mentioned 0x number stuff
There's also a test suite that I had a look at when I made changes to the code: https://github.com/nst/JSONTestSuite
That is an impressive list of tests somebody has built up.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: CPP: Crush Plotter Pen
I've done a pass on adapting WebUtils per JSON-ul.57 and it looks like a decent approach. I'm still not convinced that we are doing the right thing with wide chars.
WebClient-Core-tpr.135 & WebClient-Tests-tpr.64
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Logic: The art of being wrong with confidence...
Having seen no objections, I am about to move the three new versions ( JSON-ul.57, WebClient-Core-tpr.135 & WebClient-Tests-tpr.64) to trunk
On 2023-10-29, at 6:03 PM, Tim Rowledge tim@rowledge.org wrote:
I've done a pass on adapting WebUtils per JSON-ul.57 and it looks like a decent approach. I'm still not convinced that we are doing the right thing with wide chars.
WebClient-Core-tpr.135 & WebClient-Tests-tpr.64
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Logic: The art of being wrong with confidence...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: REP: Randomly Execute Programmers
Thanks! :-) Am 08.11.2023 00:42:28 schrieb Tim Rowledge via Squeak-dev squeak-dev@lists.squeakfoundation.org: Having seen no objections, I am about to move the three new versions ( JSON-ul.57, WebClient-Core-tpr.135 & WebClient-Tests-tpr.64) to trunk
On 2023-10-29, at 6:03 PM, Tim Rowledge wrote:
I've done a pass on adapting WebUtils per JSON-ul.57 and it looks like a decent approach. I'm still not convinced that we are doing the right thing with wide chars.
WebClient-Core-tpr.135 & WebClient-Tests-tpr.64
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Logic: The art of being wrong with confidence...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: REP: Randomly Execute Programmers
Hi Tim,
sorry for the late reply, and thank you again for your commitment. Just a few comments ...
Thus we can see that 'nul', 'nulll', 'truefalse', and 'falsef' should all be errors. The problem seems to be in the #consume: method; checking for not-alphanumeric/nil at the end looks like a plausible solution.
Actually, the parser, the Json class, does the correct thing. It is designed to process a stream, and #readFrom: reads a single JSON object from the input stream. If there are leftover bytes, that's out of the scope of the parser.
Surely truefalse, for example, is the very essence of a syntax error? In a json object/dictionary we are required to follow the value with either a $, and the next pair, or a $} to end the object. In an array we are required to follow a value with either a $, or $]. At least, that's what the diagrams say to me but I am in no way a parsing expert; am I wrong?
Speaking from my personal impression: the job of #readFrom: in Squeak is to take one object, as Levente said, from the stream, but not to exhaust the stream. The idea behind this is that parsers can contain other nested parsers. A nice example of this can be found in the TextAttribute hierarchy:
TextIndent >> scanFrom: aStream
^ self amount: (Integer readFrom: aStream)
If Integer class>>readFrom: exhausted the stream and threw an error, #nextChunkText could not parse entire a series of attributes for a whole Text. On the other #readFromString: IMO should test for stream atEnd.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
Wide-characters/UTF-8: I agree with Levente here. Why would you want to make Json more heavyweight by also giving it a responsibility for encoding?
Tests: The json tests in the WebClient are still failing, right? Also, is there any reason not to move them into Json package now?
But we do have private categories and pvt selectors. :-)
Smiley highly appropriate. They're about as private as a toilet cubicle with no door. Or walls.
But for pvt selectors, I guess they have at least a door without a lock ...
I hope we have done the right thing by removing those parsing selectors without replacement. I still wonder whether we should keep them in the deprecated package and forward them to Json class>>readFrom: plus maybe add a type check for the result ... Otherwise, this would be a good item for Object releaseNotes, I think. :-)
Here is another question: Does anybody know the story behind #asSerializedJson? JSON-klub.50 introduces it with the following explanation, but I cannot work that out:
- Added SerializedJson and String >> #asSerializedJson which can be used to build and serialize json objects with parts that have already been serialized.
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-11-07T15:40:37-08:00, tim@rowledge.org wrote:
Having seen no objections, I am about to move the three new versions ( JSON-ul.57, WebClient-Core-tpr.135 & WebClient-Tests-tpr.64) to trunk
On 2023-10-29, at 6:03 PM, Tim Rowledge <tim(a)rowledge.org> wrote:
I've done a pass on adapting WebUtils per JSON-ul.57 and it looks like a decent approach. I'm still not convinced that we are doing the right thing with wide chars.
WebClient-Core-tpr.135 & WebClient-Tests-tpr.64
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Logic: The art of being wrong with confidence...
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Strange OpCodes: REP: Randomly Execute Programmers
On 2023-11-10, at 11:55 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:
Speaking from my personal impression: the job of #readFrom: in Squeak is to take one object, as Levente said, from the stream, but not to exhaust the stream. The idea behind this is that parsers can contain other nested parsers. A nice example of this can be found in the TextAttribute hierarchy:
I'm very happy to leave issues of "what a parser should do" in the hands of people with CS degrees.
The small argument I can summon here is that the EBNF diagram appears to me to require that 'true' is followed by defined whitespace (which as I mentioned seems to include nothing, which seems very odd to me) or a separator or a terminator such as end of dictionary } and so forth. To my mind 'truefalse' breaks that requirement. One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing? Dunno. Mostly what I care about is does it work in a way that makes some sense and produces useful results.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
Wide-characters/UTF-8: I agree with Levente here. Why would you want to make Json more heavyweight by also giving it a responsibility for encoding?
That was dropped, and FWIW I agree. I was initially simply trying pass the old tests to match the outgoing code.
Tests: The json tests in the WebClient are still failing, right? Also, is there any reason not to move them into Json package now?
The ones left are there as a reminder to whomsoever can summon up more enthusiasm and fix/solve them. I had to decide on a stop point and move on to some other issues.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim No one is listening until you make a mistake
One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing?
I think so ... 'truefalse' parseAsJson fails because end of input is expected, '[truefalse]' fails because $] or $, or whitespace is expected, etc. Yes. Handling all of this within the parser/parser part for boolean, in this example, sounds less coherent to me.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
More importantly, this is also a matter of compatibility: For example, in the past, parsing JSONL (https://jsonlines.org/) was as easy as
jsonl := '{"name": "tim"} {"name": "christoph"}' readStream.
jsons := Array streamContents: [:out | [jsonl skipSeparators; atEnd] whileFalse: [out nextPut: (WebUtils jsonDecode: jsonl)]].
This doesn't work any longer because now #jsonDecode: complains that the stream is not fully consumed. So, my position remains. :-)
Regarding -0.0e0: Do these classAndValueEquals assertions even make sense? Where did you find this requirement? Python and Ruby answer a float in this case as well. JavaScript doesn't but in JavaScript 0.0 and 0 have the same class. If not required, I would opt to remove this assertion.
Regarding the Unicode test (WebClientServerTest>>#testStrings), should WebUtils class>>jsonEncode: really escape non-ascii/latin1/? characters? ... For the Json package itself I think like Levente that it should not, but is there any reason nowadays not to send arbitrary characters over the network? The main issue is that there is no sender of #jsonEncode: or #jsonDecode: in the image, so we don't know whether senders will handle the text encoding themselves. Maybe we should deprecate both selectors now completely to overcome these questions? After all, JSON is no longer a responsibility of WebClient.
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-11-10T13:47:19-08:00, tim@rowledge.org wrote:
On 2023-11-10, at 11:55 AM, <christoph.thiede(a)student.hpi.uni-potsdam.de> <christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:
Speaking from my personal impression: the job of #readFrom: in Squeak is to take one object, as Levente said, from the stream, but not to exhaust the stream. The idea behind this is that parsers can contain other nested parsers. A nice example of this can be found in the TextAttribute hierarchy:
I'm very happy to leave issues of "what a parser should do" in the hands of people with CS degrees.
The small argument I can summon here is that the EBNF diagram appears to me to require that 'true' is followed by defined whitespace (which as I mentioned seems to include nothing, which seems very odd to me) or a separator or a terminator such as end of dictionary } and so forth. To my mind 'truefalse' breaks that requirement. One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing? Dunno. Mostly what I care about is does it work in a way that makes some sense and produces useful results.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
Wide-characters/UTF-8: I agree with Levente here. Why would you want to make Json more heavyweight by also giving it a responsibility for encoding?
That was dropped, and FWIW I agree. I was initially simply trying pass the old tests to match the outgoing code.
Tests: The json tests in the WebClient are still failing, right? Also, is there any reason not to move them into Json package now?
The ones left are there as a reminder to whomsoever can summon up more enthusiasm and fix/solve them. I had to decide on a stop point and move on to some other issues.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim No one is listening until you make a mistake
On 2023-11-11, at 1:51 PM, christoph.thiede@student.hpi.uni-potsdam.de wrote:
One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing?
I think so ... 'truefalse' parseAsJson fails because end of input is expected, '[truefalse]' fails because $] or $, or whitespace is expected, etc. Yes. Handling all of this within the parser/parser part for boolean, in this example, sounds less coherent to me.
I suggest that it is a matter of expectations; for me it seems more appropriate that you get the error reported in the context of 'truefalse is a bad token' rather than 'whoops, no proper character found, look here it's $f instead of whitespace' - but as I said, ask someone with a CS degree!
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
More importantly, this is also a matter of compatibility: For example, in the past, parsing JSONL (https://jsonlines.org/) was as easy as
jsonl := '{"name": "tim"} {"name": "christoph"}' readStream.
jsons := Array streamContents: [:out | [jsonl skipSeparators; atEnd] whileFalse: [out nextPut: (WebUtils jsonDecode: jsonl)]].
This doesn't work any longer because now #jsonDecode: complains that the stream is not fully consumed. So, my position remains. :-)
Never heard of jsonl before. Quite why anyone would not just wrap things in [] to make it a json array I don't know.
The reason it used to work is that the code in WebUtils was fairly aggressively making sure that 'true' worked and 'truef' didn't. You definitely need to talk with Levent about this aspect of the code!
Regarding -0.0e0: Do these classAndValueEquals assertions even make sense? Where did you find this requirement? Python and Ruby answer a float in this case as well. JavaScript doesn't but in JavaScript 0.0 and 0 have the same class. If not required, I would opt to remove this assertion.
They were all like that previously.
Regarding the Unicode test (WebClientServerTest>>#testStrings), should WebUtils class>>jsonEncode: really escape non-ascii/latin1/? characters? ... For the Json package itself I think like Levente that it should not, but is there any reason nowadays not to send arbitrary characters over the network? The main issue is that there is no sender of #jsonEncode: or #jsonDecode: in the image, so we don't know whether senders will handle the text encoding themselves. Maybe we should deprecate both selectors now completely to overcome these questions? After all, JSON is no longer a responsibility of WebClient.
Best, Christoph
Sent from Squeak Inbox Talk
On 2023-11-10T13:47:19-08:00, tim@rowledge.org wrote:
On 2023-11-10, at 11:55 AM, <christoph.thiede(a)student.hpi.uni-potsdam.de> <christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:
Speaking from my personal impression: the job of #readFrom: in Squeak is to take one object, as Levente said, from the stream, but not to exhaust the stream. The idea behind this is that parsers can contain other nested parsers. A nice example of this can be found in the TextAttribute hierarchy:
I'm very happy to leave issues of "what a parser should do" in the hands of people with CS degrees.
The small argument I can summon here is that the EBNF diagram appears to me to require that 'true' is followed by defined whitespace (which as I mentioned seems to include nothing, which seems very odd to me) or a separator or a terminator such as end of dictionary } and so forth. To my mind 'truefalse' breaks that requirement. One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing? Dunno. Mostly what I care about is does it work in a way that makes some sense and produces useful results.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
Wide-characters/UTF-8: I agree with Levente here. Why would you want to make Json more heavyweight by also giving it a responsibility for encoding?
That was dropped, and FWIW I agree. I was initially simply trying pass the old tests to match the outgoing code.
Tests: The json tests in the WebClient are still failing, right? Also, is there any reason not to move them into Json package now?
The ones left are there as a reminder to whomsoever can summon up more enthusiasm and fix/solve them. I had to decide on a stop point and move on to some other issues.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim No one is listening until you make a mistake
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Hid behind the door when they passed out brains.
On 2023-11-11T17:03:22-08:00, tim@rowledge.org wrote:
On 2023-11-11, at 1:51 PM, christoph.thiede(a)student.hpi.uni-potsdam.de wrote:
One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing?
I think so ... 'truefalse' parseAsJson fails because end of input is expected, '[truefalse]' fails because $] or $, or whitespace is expected, etc. Yes. Handling all of this within the parser/parser part for boolean, in this example, sounds less coherent to me.
I suggest that it is a matter of expectations; for me it seems more appropriate that you get the error reported in the context of 'truefalse is a bad token' rather than 'whoops, no proper character found, look here it's $f instead of whitespace' - but as I said, ask someone with a CS degree!
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
More importantly, this is also a matter of compatibility: For example, in the past, parsing JSONL (https://jsonlines.org/) was as easy as
jsonl := '{"name": "tim"} {"name": "christoph"}' readStream.
jsons := Array streamContents: [:out | [jsonl skipSeparators; atEnd] whileFalse: [out nextPut: (WebUtils jsonDecode: jsonl)]].
This doesn't work any longer because now #jsonDecode: complains that the stream is not fully consumed. So, my position remains. :-)
Never heard of jsonl before. Quite why anyone would not just wrap things in [] to make it a json array I don't know.
I guess this is related to the general addiction to primitive obsession in mainstream languages and especially the Linux world. "Oh no, another object, can't we have something that is compatible with our good old coreutils like head and tail?"
The reason it used to work is that the code in WebUtils was fairly aggressively making sure that 'true' worked and 'truef' didn't. You definitely need to talk with Levent about this aspect of the code!
Regarding -0.0e0: Do these classAndValueEquals assertions even make sense? Where did you find this requirement? Python and Ruby answer a float in this case as well. JavaScript doesn't but in JavaScript 0.0 and 0 have the same class. If not required, I would opt to remove this assertion.
They were all like that previously.
Regarding the Unicode test (WebClientServerTest>>#testStrings), should WebUtils class>>jsonEncode: really escape non-ascii/latin1/? characters? ... For the Json package itself I think like Levente that it should not, but is there any reason nowadays not to send arbitrary characters over the network? The main issue is that there is no sender of #jsonEncode: or #jsonDecode: in the image, so we don't know whether senders will handle the text encoding themselves. Maybe we should deprecate both selectors now completely to overcome these questions? After all, JSON is no longer a responsibility of WebClient.
Best, Christoph
Sent from Squeak Inbox Talk
On 2023-11-10T13:47:19-08:00, tim(a)rowledge.org wrote:
On 2023-11-10, at 11:55 AM, <christoph.thiede(a)student.hpi.uni-potsdam.de> <christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:
Speaking from my personal impression: the job of #readFrom: in Squeak is to take one object, as Levente said, from the stream, but not to exhaust the stream. The idea behind this is that parsers can contain other nested parsers. A nice example of this can be found in the TextAttribute hierarchy:
I'm very happy to leave issues of "what a parser should do" in the hands of people with CS degrees.
The small argument I can summon here is that the EBNF diagram appears to me to require that 'true' is followed by defined whitespace (which as I mentioned seems to include nothing, which seems very odd to me) or a separator or a terminator such as end of dictionary } and so forth. To my mind 'truefalse' breaks that requirement. One can certainly make arguments about how the next loop around will raise an error because there be no appropriate value in the next char to be read. Is that the 'proper' thing? Dunno. Mostly what I care about is does it work in a way that makes some sense and produces useful results.
So if you don't have any other reason, could we just forward WebUtils class>>#jsonDecode: to Json class>>readFrom:?
I *think* that we want the check at the end of the parsing to detect some error cases.
Wide-characters/UTF-8: I agree with Levente here. Why would you want to make Json more heavyweight by also giving it a responsibility for encoding?
That was dropped, and FWIW I agree. I was initially simply trying pass the old tests to match the outgoing code.
Tests: The json tests in the WebClient are still failing, right? Also, is there any reason not to move them into Json package now?
The ones left are there as a reminder to whomsoever can summon up more enthusiasm and fix/solve them. I had to decide on a stop point and move on to some other issues.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim No one is listening until you make a mistake
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Useful random insult:- Hid behind the door when they passed out brains.
--- Sent from Squeak Inbox Talk
squeak-dev@lists.squeakfoundation.org