A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
frank
[1] http://build.squeak.org/job/SqueakTrunk/756/testReport/TraitsTests.Kernel/Tr...
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Can't program his way out of a for-loop.
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
frank
On 11 January 2014 22:55, Frank Shearar frank.shearar@gmail.com wrote:
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a relatively slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but still pass if given enough time, that implies to me that either a) the tests are being run on a slower machine due to some system configuration change (are other jobs also running and taking time?) b) something has changed in the execution path to slow down the tested code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test.
The leaves are very interesting:
**Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at:
_24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically.
frank
2014/1/13 Frank Shearar frank.shearar@gmail.com
On 11 January 2014 22:55, Frank Shearar frank.shearar@gmail.com wrote:
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com
wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a relatively
slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but
still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system
configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the tested
code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test.
The leaves are very interesting:
**Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at:
_24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically.
frank
I don't get same figures here:
1st run: **Leaves** 50.5% {3145ms} CompiledMethod>>originalTraitMethod: 18.1% {1125ms} Delay>>wait 11.8% {731ms} MultiByteFileStream>>nextPut: 5.1% {316ms} Trait(ClassDescription)>>organization 3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 1.7% {108ms} CompiledMethodWithNode>>selector 1.1% {67ms} ByteSymbol(Symbol)>>=
2nd run:
**Leaves** 50.5% {1991ms} CompiledMethod>>originalTraitMethod: 15.7% {617ms} MultiByteFileStream>>nextPut: 8.8% {346ms} Delay>>wait 4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 3.9% {153ms} Trait(ClassDescription)>>organization 2.3% {92ms} ByteSymbol(Symbol)>>= 1.5% {59ms} WeakArray class>>finalizationProcess 1.4% {54ms} CompiledMethodWithNode>>selector 1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
Hmm, great variations between 2 runs? I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.
With an AndreasSystemProfiler, here are the leaves:
1st run: **Leaves** 57.1 (2,518) Array elementsForwardIdentityTo: 16.4 (722) StandardFileStream primWrite:from:startingAt:count: 6.5 (288) Symbol flushCache 5.7 (252) StandardFileStream primOpen:writable:
2nd run: **Leaves** 64.8 (4,436) Array elementsForwardIdentityTo: 17.3 (1,183) StandardFileStream primWrite:from:startingAt:count: 4.6 (318) Symbol flushCache 4.1 (282) StandardFileStream primOpen:writable:
Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.
Cheers, Karl
On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2014/1/13 Frank Shearar frank.shearar@gmail.com
On 11 January 2014 22:55, Frank Shearar frank.shearar@gmail.com wrote:
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com
wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a relatively
slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but
still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system
configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the tested
code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test.
The leaves are very interesting:
**Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at:
_24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically.
frank
I don't get same figures here:
1st run: **Leaves** 50.5% {3145ms} CompiledMethod>>originalTraitMethod: 18.1% {1125ms} Delay>>wait 11.8% {731ms} MultiByteFileStream>>nextPut: 5.1% {316ms} Trait(ClassDescription)>>organization 3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 1.7% {108ms} CompiledMethodWithNode>>selector 1.1% {67ms} ByteSymbol(Symbol)>>=
2nd run:
**Leaves** 50.5% {1991ms} CompiledMethod>>originalTraitMethod: 15.7% {617ms} MultiByteFileStream>>nextPut: 8.8% {346ms} Delay>>wait 4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 3.9% {153ms} Trait(ClassDescription)>>organization 2.3% {92ms} ByteSymbol(Symbol)>>= 1.5% {59ms} WeakArray class>>finalizationProcess 1.4% {54ms} CompiledMethodWithNode>>selector 1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
Hmm, great variations between 2 runs? I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.
With an AndreasSystemProfiler, here are the leaves:
1st run: **Leaves** 57.1 (2,518) Array elementsForwardIdentityTo: 16.4 (722) StandardFileStream primWrite:from:startingAt:count: 6.5 (288) Symbol flushCache 5.7 (252) StandardFileStream primOpen:writable:
2nd run: **Leaves** 64.8 (4,436) Array elementsForwardIdentityTo: 17.3 (1,183) StandardFileStream primWrite:from:startingAt:count: 4.6 (318) Symbol flushCache 4.1 (282) StandardFileStream primOpen:writable:
Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg karlramberg@gmail.comwrote:
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.
Cheers, Karl
On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2014/1/13 Frank Shearar frank.shearar@gmail.com
On 11 January 2014 22:55, Frank Shearar frank.shearar@gmail.com wrote:
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com
wrote:
A bunch of the Trait tests fail, despite no one having worked on the Traits code in quite a while. The symptom is that the tests time out [1]. I've tried to debug some, only to find that they pass. For instance, <timeout: 60> sometimes "fixes" the tests.
But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a
relatively slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but
still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system
configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the
tested code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test.
The leaves are very interesting:
**Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at:
_24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically.
frank
I don't get same figures here:
1st run: **Leaves** 50.5% {3145ms} CompiledMethod>>originalTraitMethod: 18.1% {1125ms} Delay>>wait 11.8% {731ms} MultiByteFileStream>>nextPut: 5.1% {316ms} Trait(ClassDescription)>>organization 3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 1.7% {108ms} CompiledMethodWithNode>>selector 1.1% {67ms} ByteSymbol(Symbol)>>=
2nd run:
**Leaves** 50.5% {1991ms} CompiledMethod>>originalTraitMethod: 15.7% {617ms} MultiByteFileStream>>nextPut: 8.8% {346ms} Delay>>wait 4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 3.9% {153ms} Trait(ClassDescription)>>organization 2.3% {92ms} ByteSymbol(Symbol)>>= 1.5% {59ms} WeakArray class>>finalizationProcess 1.4% {54ms} CompiledMethodWithNode>>selector 1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
Hmm, great variations between 2 runs? I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.
With an AndreasSystemProfiler, here are the leaves:
1st run: **Leaves** 57.1 (2,518) Array elementsForwardIdentityTo: 16.4 (722) StandardFileStream primWrite:from:startingAt:count: 6.5 (288) Symbol flushCache 5.7 (252) StandardFileStream primOpen:writable:
2nd run: **Leaves** 64.8 (4,436) Array elementsForwardIdentityTo: 17.3 (1,183) StandardFileStream primWrite:from:startingAt:count: 4.6 (318) Symbol flushCache 4.1 (282) StandardFileStream primOpen:writable:
Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
On Mon, Jan 13, 2014 at 11:53 PM, karl ramberg karlramberg@gmail.comwrote:
I get a prim failed error on a MultiByteFileStream on Windows 8 running the Traits tests.
Cheers, Karl
On Mon, Jan 13, 2014 at 10:15 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2014/1/13 Frank Shearar frank.shearar@gmail.com
On 11 January 2014 22:55, Frank Shearar frank.shearar@gmail.com wrote:
On 11 January 2014 19:19, tim Rowledge tim@rowledge.org wrote:
On 11-01-2014, at 6:29 AM, Frank Shearar frank.shearar@gmail.com
wrote:
> A bunch of the Trait tests fail, despite no one having worked on the > Traits code in quite a while. The symptom is that the tests time out > [1]. I've tried to debug some, only to find that they pass. For > instance, <timeout: 60> sometimes "fixes" the tests. > > But the tests shouldn't be failing. Any takers?
I noticed similar issues when running test on a Pi; being a
relatively slow machine it takes longer to do some things and needed a larger timeout value.
If tests that used to run within the time are now taking longer but
still pass if given enough time, that implies to me that either
a) the tests are being run on a slower machine due to some system
configuration change (are other jobs also running and taking time?)
b) something has changed in the execution path to slow down the
tested code. Which obviously, ought not be the case.
They time out on my beefy work laptop: a 2.7 GHz machine. That's probably a higher spec machine than the build server.
If I was a betting man I'd be inclined to put my money on (b).
Attached is the output of MessageTally spyOn: [[TraitCompositionTest debug: #testAliasCompositions] on: TestFailure do: [#fail]], after adding <timeout: 60> to the test.
The leaves are very interesting:
**Leaves** 43.1% {28650ms} CompiledMethod>>originalTraitMethod: 36.9% {24505ms} Text>>size 9.7% {6425ms} Delay>>wait 3.4% {2239ms} SecurityManager class>>default 1.1% {749ms} CompiledMethod>>checkOKToAdd:at:
_24 seconds_ in Text >> #size ?! Also, CompiledMethod>>originalTraitMethod: hasn't changed in 4 years. In fact the only things that have changed, that I can see, in the code path are Chris Muller's fixing of TestCase >> #debug and Nicolas Cellier's cleanup of the Compiler API. Nothing looks even remotely capable of slowing stuff down this dramatically.
frank
I don't get same figures here:
1st run: **Leaves** 50.5% {3145ms} CompiledMethod>>originalTraitMethod: 18.1% {1125ms} Delay>>wait 11.8% {731ms} MultiByteFileStream>>nextPut: 5.1% {316ms} Trait(ClassDescription)>>organization 3.3% {206ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 1.7% {108ms} CompiledMethodWithNode>>selector 1.1% {67ms} ByteSymbol(Symbol)>>=
2nd run:
**Leaves** 50.5% {1991ms} CompiledMethod>>originalTraitMethod: 15.7% {617ms} MultiByteFileStream>>nextPut: 8.8% {346ms} Delay>>wait 4.6% {180ms} MultiByteFileStream(StandardFileStream)>>open:forWrite: 3.9% {153ms} Trait(ClassDescription)>>organization 2.3% {92ms} ByteSymbol(Symbol)>>= 1.5% {59ms} WeakArray class>>finalizationProcess 1.4% {54ms} CompiledMethodWithNode>>selector 1.1% {42ms} Array(SequenceableCollection)>>indexOf:startingAt:ifAbsent:
Hmm, great variations between 2 runs? I also note presence of AnObsoleteC1/2 as large consumers... Not that healthy.
With an AndreasSystemProfiler, here are the leaves:
1st run: **Leaves** 57.1 (2,518) Array elementsForwardIdentityTo: 16.4 (722) StandardFileStream primWrite:from:startingAt:count: 6.5 (288) Symbol flushCache 5.7 (252) StandardFileStream primOpen:writable:
2nd run: **Leaves** 64.8 (4,436) Array elementsForwardIdentityTo: 17.3 (1,183) StandardFileStream primWrite:from:startingAt:count: 4.6 (318) Symbol flushCache 4.1 (282) StandardFileStream primOpen:writable:
Though I failed to find the direct call to elementsForwardIdentityTo: in the send chain (probably thru becomeForward:)...
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the problems he described seemed to be related to primSize checks on a changes file that had been closed (this would presumably be a process trying to append to the changes file when some other process had closed the file stream).
Dave
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the problems he described seemed to be related to primSize checks on a changes file that had been closed (this would presumably be a process trying to append to the changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
frank
Dave
On Tue, Jan 14, 2014 at 8:20 AM, Frank Shearar frank.shearar@gmail.comwrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it
would
be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the
problems
he described seemed to be related to primSize checks on a changes file
that
had been closed (this would presumably be a process trying to append to
the
changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this.
I'm also using windows so can look like the problem is related to that somehow.
Karl
So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
frank
Dave
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar frank.shearar@gmail.comwrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it
would
be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the
problems
he described seemed to be related to primSize checks on a changes file
that
had been closed (this would presumably be a process trying to append to
the
changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
HTH
On 14 January 2014 19:10, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar frank.shearar@gmail.com wrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the problems he described seemed to be related to primSize checks on a changes file that had been closed (this would presumably be a process trying to append to the changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in this filein differs from trunk's only in the "reject: [:ea | [ea methodClass language isNewspeakLanguage3]ifError:[false]" guard. So the significant part of the changeset is PackageInfo >> #changeRecordForOverriddenMethod:, which I've submitted to the Inbox as an mcz so we can all have a look.
Thanks!
frank
HTH
best, Eliot
2014/1/15 Frank Shearar frank.shearar@gmail.com
On 14 January 2014 19:10, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <frank.shearar@gmail.com
wrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change log the last few months (particularly when loading/merging with MC), it would be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in another thread. He was ending up with a corrupt changes file IIRC, and the problems he described seemed to be related to primSize checks on a changes file that had been closed (this would presumably be a process trying to append
to
the changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that
our
changeset can point you towards a fix.
Yeah, that's the problems that I most often encounter with changes files, and for a long time: 'no such file or directory' when opening the changes failed I agree, most probably a simple problem of limited resources (like too many readOnlyCopy files were opened, but not yet closed/reclaimed). Eliot, isn't it your fault? The faster the VM, the sooner we hit the resource limits
But I also managed to get a corrupt change log recently, could not trace it... I suspect that closing the original changes is a bit different than starving of new readOnlyCopy, but who knows...
Thanks, Eliot. From what I can make out, the MCPackage >> #snapshot in this filein differs from trunk's only in the "reject: [:ea | [ea methodClass language isNewspeakLanguage3]ifError:[false]" guard. So the significant part of the changeset is PackageInfo >> #changeRecordForOverriddenMethod:, which I've submitted to the Inbox as an mcz so we can all have a look.
Thanks!
frank
HTH
best, Eliot
I don't think the issue is file handles. I tested a new trunk image and got the prim failed. I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.
It's just the traits tests that show the error
Cheers, Karl
On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda eliot.miranda@gmail.comwrote:
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar frank.shearar@gmail.comwrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change
log
the last few months (particularly when loading/merging with MC), it
would
be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in
another
thread. He was ending up with a corrupt changes file IIRC, and the
problems
he described seemed to be related to primSize checks on a changes file
that
had been closed (this would presumably be a process trying to append to
the
changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
HTH
best, Eliot
I think I found a clue. I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to earlier version than 12/31/2013.
Cheers, Karl
On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg karlramberg@gmail.com wrote:
I don't think the issue is file handles. I tested a new trunk image and got the prim failed. I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.
It's just the traits tests that show the error
Cheers, Karl
On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda eliot.miranda@gmail.comwrote:
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar frank.shearar@gmail.comwrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
TraitsFileOutTest is failing
Correct log attached this time
Cheers, Karl
id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
means the file was closed...
I observed increasing failure rate related to randomly closed change
log
the last few months (particularly when loading/merging with MC), it
would
be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in
another
thread. He was ending up with a corrupt changes file IIRC, and the
problems
he described seemed to be related to primSize checks on a changes file
that
had been closed (this would presumably be a process trying to append
to the
changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
HTH
best, Eliot
I found the source of the MultiByteFileStream(Object)>>primitiveFailed: problem on my system. A antivirus program was checking the file while it was used by Squeak.
I googled a little and found this clue: http://marc.info/?l=squeak-dev&m=118046065627943&w=2
All Traits test pass now and much quicker that before :-)
Cheers, Karl
On Wed, Jan 15, 2014 at 10:24 PM, karl ramberg karlramberg@gmail.comwrote:
I think I found a clue. I do not get a prim failed error when I revert the method Environment>>forgetClass:logged: to earlier version than 12/31/2013.
Cheers, Karl
On Wed, Jan 15, 2014 at 9:18 PM, karl ramberg karlramberg@gmail.comwrote:
I don't think the issue is file handles. I tested a new trunk image and got the prim failed. I tested an other image I have with literally hundreds (578) of sub instances of streams and got the error.
It's just the traits tests that show the error
Cheers, Karl
On Tue, Jan 14, 2014 at 8:10 PM, Eliot Miranda eliot.miranda@gmail.comwrote:
Hi Frank,
On Mon, Jan 13, 2014 at 11:20 PM, Frank Shearar <frank.shearar@gmail.com
wrote:
On 13 January 2014 23:42, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Jan 14, 2014 at 12:28:45AM +0100, Nicolas Cellier wrote:
2014/1/14 karl ramberg karlramberg@gmail.com
> TraitsFileOutTest is failing > > Correct log attached this time > > Cheers, > Karl > > id: #[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] means the file was closed...
I observed increasing failure rate related to randomly closed change
log
the last few months (particularly when loading/merging with MC), it
would
be good to identify the root cause...
That sounds like exactly the problem that Frank was describing in
another
thread. He was ending up with a corrupt changes file IIRC, and the
problems
he described seemed to be related to primSize checks on a changes
file that
had been closed (this would presumably be a process trying to append
to the
changes file when some other process had closed the file stream).
I'm relieved it's not just me, but sad that it's not just me experiencing this. So my understanding is that changes are stored by accessing SourceFiles at: 2. If you want a readonly copy of sources or changes, you use CurrentReadOnlySourceFiles at: 2. I suppose the first step is verifying that nothing referencing these _directly_ does things like closing files (I'd be tempted to look at CROSF first.)
forgive me for not replying earlier but I needed help to track this down. We suffered a very similar symptom a while ago at Cadence. It turns out that in our case the issue was the system running out of file handles because it left lots of fils open. The cause was with Monticello package loading and the fix was to use CurrentReadOnlySourceFiles more. I'm attaching a changeset in the hope that this is the same issue and that our changeset can point you towards a fix.
HTH
best, Eliot
squeak-dev@lists.squeakfoundation.org