Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios| ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use... |ios| ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')). [(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
Thank you.
I will have a look tomorrow.
I had some problems with OSProcess and the CommandLine(?) stuff on the latest image cut.
cordially,
t
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
Here is a failed naive attempt. The Unix Proc wants a standard filestream, not a squeaky ReadWriteStream...
| filename in ios path|
path := '/home/wm/usr/src/smalltalk/XML/'.
filename := 'bookstore.xml'.
in := OSProcess readOnlyFileNamed: path, filename.
ios := ReadWriteStream on:''.
proc := UnixProcess
forkJob: '/bin/bzcat'
arguments: nil
environment: nil
descriptors: (Array with: in with: ios with: nil). <--this does not work.
ios printOn: Transcript.
in close.
Tomorrow, I will try a FileStream on the designated Stdout ...but that seems hokey.
I really want to get the output from bzcat directly into Squeak.
ideas appreciated.
cordially,
t
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
also, That OSProcess code looks really nice.
cudos and thanks.
tty
---- On Tue, 11 Jul 2023 16:19:13 -0400 gettimothy via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote ---
Here is a failed naive attempt. The Unix Proc wants a standard filestream, not a squeaky ReadWriteStream...
| filename in ios path|
path := '/home/wm/usr/src/smalltalk/XML/'.
filename := 'bookstore.xml'.
in := OSProcess readOnlyFileNamed: path, filename.
ios := ReadWriteStream on:''.
proc := UnixProcess
forkJob: '/bin/bzcat'
arguments: nil
environment: nil
descriptors: (Array with: in with: ios with: nil). <--this does not work.
ios printOn: Transcript.
in close.
Tomorrow, I will try a FileStream on the designated Stdout ...but that seems hokey.
I really want to get the output from bzcat directly into Squeak.
ideas appreciated.
cordially,
t
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda mailto:eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
Trying to reach into the bloody guts of a Unix process did not work, but the OSPipe seems to know how to do it.
I hijacked the catAFile example along with an OSPipe example/test and I got something called an AttachableFileStream.
bzcatAFileToPipe
"Pipe bzcat output to some AttachableFileStream...whatever the heck that is...."
"UnixProcess bzcatAFile"
| filename in pipe2 output dest child path |
path := '/home/wm/usr/src/smalltalk/XML/'.
filename := 'bookstore.xml.bz2'.
in := OSProcess readOnlyFileNamed: path, filename.
pipe2 := OSPipe nonBlockingPipe.
output := pipe2 writer.
dest := pipe2 reader.
child := UnixProcess
forkJob: '/bin/bzcat'
arguments: nil
environment: nil
descriptors: (Array with: in with: output with: nil).
in close.
(Delay forSeconds: 1) wait.
child sigterm.
^ dest "be sure to close it on inspection"
|result|
result := self next: 100.
Transcript show: result.
'
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
' <author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book
thanks all for the pointers.
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
I can get the data, thank for the pointer.
|ios path filename|
Transcript clear.
path := '/home/wm/usr/src/smalltalk/XML/' .
filename := 'bookstore.xml.bz2'.
ios := UnixProcess bzcatAFileToPipe: path filename: filename.
(DocDemoSaxHandler on: ios) debug: true; pingevery:1000; optimizeForLargeDocuments;parseDocument.
where the stream that the SaxHandler depends on comes from David's work:
bzcatAFileToPipe: pathString filename:filestring
"Pipe bzcat output to some AttachableFileStream...whatever the heck that is...."
"UnixProcess bzcatAFile"
| filename in pipe2 output dest child path |
path := pathString.
filename := filestring.
in := OSProcess readOnlyFileNamed: path, filename.
pipe2 := OSPipe nonBlockingPipe.
output := pipe2 writer.
dest := pipe2 reader.
child := UnixProcess
forkJob: '/bin/bzcat'
arguments: nil
environment: nil
descriptors: (Array with: in with: output with: nil).
in close.
(Delay forSeconds: 1) wait.
child sigterm.
^ dest "be sure to close it on inspection"
Its a great tool. We do not have to write a tar file reader or a bzip reader....we can work directly with some great existing tools.
thanks again.
If I inspect the ios (AttachableFileStream ?) and do a Transcript show: (self next:10000000) where the 10000000 is much bigger than the contents the stream has, the squeak system freezes.
Similarly, when I run the
(DocDemoSaxHandler on: ios) debug: true; pingevery:1000; optimizeForLargeDocuments;parseDocument.
The document data prints out nicely, but squeak goes into a tight loop /freeze.
thanks to all again, very, very helpful.
tty
---- On Mon, 10 Jul 2023 18:00:53 -0400 Eliot Miranda eliot.miranda@gmail.com wrote ---
Hi,
yes, it should be doable with David Lewis’s OSProcess package.
_,,,^..^,,,_ (phone)
On Jul 10, 2023, at 12:03 PM, gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org wrote:
Hi Folks.
I have a 21Gb bzip file I would rather not decompress as disk space is at a semi-premium.
the bzcat command allows me to "extract" the contents to stoud
bzcat humungousfile.bz2 | less
gives me the output I want to process
For running XMLSax on a file, I have some existing code to use...
|ios|
ios := (FileStream readOnlyFileNamed:('/your/path/to/the/big/xml/file.xml')).
[(DocDemoSaxHandler on: ios) pingevery:100000; optimizeForLargeDocuments;parseDocument] timeProfile .
What I would like to do is have, from within Squeak access to that bzcat output via some sort of ReadStream.
Doable?
thx in advance.
tty
squeak-dev@lists.squeakfoundation.org