Hi All,
one thing this lock-up suggests is that interrupting should interrupt all processes running at user priority, not just the uiProcess. Does that make sense? It does to me, but would be something I'd control by a preference for testing its effects.
On Mon, Feb 29, 2016 at 7:23 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi All,
I'm trying to investigate the socket issues in aio.c but have found a
much moire basic issue. With my recent changes to Network that more carefully checked for errors the SocketTest>>testSocketReuse test appears to lock up. In fact, the VM is fine, happily doing what it's being told by Socket>>#sendData:count:
Socket>>sendData: buffer count: n "Send the amount of data from the given buffer" | sent | sent := 0. [sent < n] whileTrue:[ sent := sent + (self sendSomeData: buffer startIndex: sent+1 count: (n-sent))].
The VM keeps trying to send data on a socket that is being reused and gets an error from sendto, answers 0 as the number of bytes sent, as required, but Socket>>#sendData:count: pays no heed and spins hard. Here's the traces:
The test is SocketTest>>testSocketReuse which spawns two processes, one to send and one to receive data. Here are the processes:
Process 0x48641f8 priority 40 0xbfec0498 M Socket>sendSomeData:startIndex:count:for: 0x4864d18: a(n) Socket 0xbfec04c0 M Socket>sendSomeData:startIndex:count: 0x4864d18: a(n) Socket 0xbfec04ec M Socket>sendData:count: 0x4864d18: a(n) Socket 0xbfec0520 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest 0xbfec0540 I [] in BlockClosure>newProcess 0x4864df0: a(n) BlockClosure
Process 0x6543178 priority 40 0xbfec22c8 I [] in Delay>wait 0x4864ea0: a(n) Delay 0xbfec22f0 I BlockClosure>ifCurtailed: 0x4864eb8: a(n) BlockClosure 0xbfec2314 I Delay>wait 0x4864ea0: a(n) Delay 0xbfec2340 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest 0xbfec2360 M BlockClosure>ensure: 0x4864fa8: a(n) BlockClosure 0xbfec2390 I SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest
Process 0x4864168 priority 40 0xbfec3438 I [] in DelayWaitTimeout>wait 0x48652f8: a(n) DelayWaitTimeout 0xbfec3458 M BlockClosure>ensure: 0x4865378: a(n) BlockClosure 0xbfec347c I DelayWaitTimeout>wait 0x48652f8: a(n) DelayWaitTimeout 0xbfec34a0 I Semaphore>waitTimeoutMSecs: 0x48652e0: a(n) Semaphore 0xbfec34c4 I Socket>waitForDataIfClosed: 0x4865408: a(n) Socket 0xbfec34f0 I Socket>receiveDataInto:startingAt: 0x4865408: a(n) Socket 0xbfec3520 I [] in SocketTest>testSocketReuse 0x4864dd0: a(n) SocketTest 0xbfec3540 I [] in BlockClosure>newProcess 0x48654c0: a(n) BlockClosure
And here's the VM spinning: 15726 0 sqUnixSocket.c:1128 UDP sendData(11, 16) 15726 0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected 15726 0 sqUnixSocket.c:1128 UDP sendData(11, 16) 15726 0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected 15726 0 sqUnixSocket.c:1128 UDP sendData(11, 16) 15726 0 sqUnixSocket.c:1134 UDP send failed 56 Socket is already connected ...etc...
Ah!! Of course. Because I have changed the default scheduling semantics in Squeak 5 to make preemption not a yield point, Socket>>#sendData:count: never yields to the other processes. Previously when the Delay process woke up this would implicitly yield the process spinning in Socket>>#sendData:count:.
So Socket>>#sendData:count: needs to do a yield if no data is sent. However, shouldn't but also check for errors if no data is sent and do something like return an error if it discovers, via Socket>>primSocketError:, that the socket is not happy?
_,,,^..^,,,_ best, Eliot
On 01.03.2016, at 21:14, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi All,
one thing this lock-up suggests is that interrupting should interrupt all processes running at user priority, not just the uiProcess. Does that make sense? It does to me, but would be something I'd control by a preference for testing its effects.
I thought it interrupted the active process? Wouldn’t that make most sense?
- Bert -
On Tue, Mar 1, 2016 at 12:32 PM, Bert Freudenberg bert@freudenbergs.de wrote:
On 01.03.2016, at 21:14, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi All,
one thing this lock-up suggests is that interrupting should
interrupt all processes running at user priority, not just the uiProcess. Does that make sense? It does to me, but would be something I'd control by a preference for testing its effects.
I thought it interrupted the active process? Wouldn’t that make most sense?
Not necessarily. For example, in the test I referred to, testSocketReuse, the ui process (the process running the test) spawns two other processes that spin hard, one trying to write to a socket and one trying to read form a socket. If the socket code doesn't detect errors properly then these processes continue to spin hard. If one interrupts then /nothing/ appears to happen. The ui process is indeed interrupted, but because the other two processes continue to spin hard they shut out the notifier which doesn't appear. And even if the notifier did appear those processes would still be spinning hard, making it difficult for the user to interact with the notifier. So in this case it makes sense to interrupt all processes running at user priority. Arguably it makes sense to interrupt any and all processes running at or above user priority and below user interrupt priority. Usually there's only the ui process in this range, but occasionally there are more and errors can cause them to make an interrupt ineffective if it only interrupts the ui process.
- Bert -
squeak-dev@lists.squeakfoundation.org