Apologies, accidentally posted to squeak-dev instead of vm-dev.
- A.
-------- Original Message -------- Subject: Image file loading Date: Tue, 19 Jan 2010 14:06:00 -0800 From: Andreas Raab andreas.raab@gmx.de Reply-To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Newsgroups: gmane.comp.lang.smalltalk.squeak.general
Folks -
I think that one of the more clever hacks I did for the Android VM deserves proper generalization in the Interpreter. Since on Android the image is part of the (compressed) application package it's not easily possible to pass a file handle into the VM and have the VM read the image as an external file.
Instead, I'm preloading the image file into memory and then hacked an implementation of the sqImageFile* and sqAlloc* functions that would basically operate on the image in memory. I think we should formalize this idea for two reasons:
a) It allows preloading the image into memory on platforms where the image may not be in an external file.
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
The VMs would generally use this along the lines of:
memStart = self malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize);
and of course a backwards compatible version of readImageFromFile:HeapSize:StartingAt: could do the same (although I'd rather drop that function altogether since it removes the need for supporting various sqImageFile* functions).
I'd also say that the same pattern should be used for *writing* image files, that is we assemble the entire image in memory then call sqImageFileWrite(fileName, heapStart, heapSize) and sqImageFileWrite is really just a combo of fopen(); fwrite(); fclose();
I think this scheme is much nicer, less complicated and offers more options for implementing image loading than what we currently have.
Comments?
Cheers, - Andreas
On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
So how is this different that what happens now?
In interp.h we have:
#ifndef allocateMemoryMinimumImageFileHeaderSize /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize) #endif
#ifndef sqImageFileReadEntireImage /* Called by Interpreter>>sqImage:read:size:length: */ #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream) #endif
In platform specfic I change that
usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize); #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemoryMac(heapSize, minimumMemory, fileStream, headerSize)
#ifdef BUILD_FOR_OSX size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) #else #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
The interp.c slang generated below on os-x the sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) does read the data from fileStream for the given length into memoryAddress. But on the iPhone we just return the length as per the #define above since the allocateMemoryMinimumImageFileHeaderSize has mmapped the file.
slang...
memory = allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, f, headerSize); if (memory == null) { insufficientMemoryAvailableError(); } ... sqImageFileSeek(f, headerStart + headerSize); /* begin sqImage:read:size:length: */ memoryAddress = pointerForOop(memory); bytesRead = sqImageFileReadEntireImage(memoryAddress, sizeof(unsigned char), dataSize, f); if (bytesRead != dataSize) { unableToReadImageError(); } -- =========================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
John M McIntosh wrote:
On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
So how is this different that what happens now?
That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity.
That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means:
#ifndef allocateMemoryMinimumImageFileHeaderSize
Gets removed.
/* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
Gets removed.
#ifndef sqImageFileReadEntireImage
Gets removed.
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
Gets removed.
In platform specfic I change that
usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
Gets removed.
#ifdef BUILD_FOR_OSX size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) #else #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
Gets removed.
Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more.
Cheers, - Andreas
For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap allocate and read the file into the anonymous mmap
readImageFromHeapSize does nothing.
or for file based mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap the image file.
So I need the readImageFromHeapSize for?
On 2010-01-19, at 3:09 PM, Andreas Raab wrote:
John M McIntosh wrote:
On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
So how is this different that what happens now?
That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity.
That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means:
#ifndef allocateMemoryMinimumImageFileHeaderSize
Gets removed.
/* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
Gets removed.
#ifndef sqImageFileReadEntireImage
Gets removed.
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
Gets removed.
In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
Gets removed.
#ifdef BUILD_FOR_OSX size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) #else #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
Gets removed.
Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more.
Cheers,
- Andreas
-- =========================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
John M McIntosh wrote:
For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap allocate and read the file into the anonymous mmap
readImageFromHeapSize does nothing.
or for file based mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap the image file.
So I need the readImageFromHeapSize for?
To tell the VM where your memory is located (replacement of sqAlloc*), and how much you've pre-allocated. Like I wrote in the original post:
"The VMs would generally use this along the lines of:
memStart = malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize);
etc."
Cheers, - Andreas
On 2010-01-19, at 3:09 PM, Andreas Raab wrote:
John M McIntosh wrote:
On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
So how is this different that what happens now?
That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity.
That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means:
#ifndef allocateMemoryMinimumImageFileHeaderSize
Gets removed.
/* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
Gets removed.
#ifndef sqImageFileReadEntireImage
Gets removed.
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
Gets removed.
In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
Gets removed.
#ifdef BUILD_FOR_OSX size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) #else #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
Gets removed.
Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more.
Cheers,
- Andreas
--
John M. McIntosh johnmci@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
2010/1/20 Andreas Raab andreas.raab@gmx.de:
John M McIntosh wrote:
For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap allocate and read the file into the anonymous mmap
readImageFromHeapSize does nothing. or for file based mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap the image file. So I need the readImageFromHeapSize for?
To tell the VM where your memory is located (replacement of sqAlloc*), and how much you've pre-allocated. Like I wrote in the original post:
"The VMs would generally use this along the lines of:
memStart = malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize);
etc."
RIGHT! Who said that image could be read only from file? It could be a memory location, it could be a socket stream, it could be anything.
So, i very like your idea with replacing an entry point by abstract
readImageFromObject: object size: heapSize
which could be just a 2-liner:
memStart := self sqMalloc: heapSize. self ioReadFrom: object Into: memStart Size: heapSize.
(and object is of type 'void *', which leaves a platform code to decide its meaning)
sqMalloc() and ioReadFromIntoSize(), also provided by platform-specific code)..
This makes an interpreter completely agnostic from data source, where actual image is stored.
Cheers, - Andreas
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
Where header is the address of the image file header, and address is the address of the uninitialized image beginning after the file header.
Rationale:
- "Install" is more appropriate that "read", since the contents of the image file have already been read.
- "Heap" is not relevant.
- The file header is logically distinct from from the contents of the image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Dave
(*) I've noticed that the format of the file header for 64-bit images is not optimal, so a logical separation of the file header from the image proper just seems like the right thing to do.
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
"Install" is more appropriate that "read", since the contents of the image file have already been read.
"Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers, - Andreas
Ohh, even better.
Platform code invoking:
sqReadImageFrom(void *object).
then, an interpreter reading a header and determining a heap size. (a question, though, does image size encoded in header?)
once interpreter prepares a memory for image and decoding the header, it makes a second call to platform code to read the object memory, and we're done.
No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own!
2010/1/20 Andreas Raab andreas.raab@gmx.de:
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
- "Install" is more appropriate that "read", since the contents of the
image file have already been read.
- "Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers, - Andreas
Igor Stasenko wrote:
Platform code invoking:
sqReadImageFrom(void *object).
then, an interpreter reading a header and determining a heap size. (a question, though, does image size encoded in header?)
The image size is encoded in the file header. The heap size isn't. That's where the second argument came from that I provided.
once interpreter prepares a memory for image and decoding the header, it makes a second call to platform code to read the object memory, and we're done.
Why even do that? The platform code can allocate and read the thing before it passes it on to the interpreter. Read my example again:
main.c:
memory = malloc(heapSize); fread(imageFile, memory, fsize(imageFile)); readImageFromHeapSize(memory, heapSize); interpret();
The platform code reads the allocates memory and reads the image. It can do this by reading a file or (in the Android case) by doing a series of JNI calls that pass uncompressed portions of a chunked image file from the app package (yes, it is complicated on Android). Or the support code could read it from a socket connection. Whatever. There is no reason for the interpreter to be involved *at all* in reading the image file.
Cheers, - Andreas
No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own!
2010/1/20 Andreas Raab andreas.raab@gmx.de:
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
- "Install" is more appropriate that "read", since the contents of the
image file have already been read.
- "Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers,
- Andreas
2010/1/20 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
Platform code invoking:
sqReadImageFrom(void *object).
then, an interpreter reading a header and determining a heap size. (a question, though, does image size encoded in header?)
The image size is encoded in the file header. The heap size isn't. That's where the second argument came from that I provided.
once interpreter prepares a memory for image and decoding the header, it makes a second call to platform code to read the object memory, and we're done.
Why even do that? The platform code can allocate and read the thing before it passes it on to the interpreter. Read my example again:
main.c:
memory = malloc(heapSize); fread(imageFile, memory, fsize(imageFile)); readImageFromHeapSize(memory, heapSize); interpret();
The platform code reads the allocates memory and reads the image. It can do this by reading a file or (in the Android case) by doing a series of JNI calls that pass uncompressed portions of a chunked image file from the app package (yes, it is complicated on Android). Or the support code could read it from a socket connection. Whatever. There is no reason for the interpreter to be involved *at all* in reading the image file.
Well, the current code implies some logic when decoding a header, like allocating enough space + breathing room for object memory.
So, i thought that its better when interpreter controls the memory allocation size, not platform code alone.
Cheers, - Andreas
No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own!
2010/1/20 Andreas Raab andreas.raab@gmx.de:
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
- "Install" is more appropriate that "read", since the contents of the
image file have already been read.
- "Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers, - Andreas
Igor Stasenko wrote:
2010/1/20 Andreas Raab andreas.raab@gmx.de:
The platform code reads the allocates memory and reads the image. It can do this by reading a file or (in the Android case) by doing a series of JNI calls that pass uncompressed portions of a chunked image file from the app package (yes, it is complicated on Android). Or the support code could read it from a socket connection. Whatever. There is no reason for the interpreter to be involved *at all* in reading the image file.
Well, the current code implies some logic when decoding a header, like allocating enough space + breathing room for object memory.
That's a problem the support code already deals with. The support code already knows the limits on memory that the user specified (via -memory argument or implicitly via mmap() or similar means) and it also already needs to check that the image can actually fit into the heap (or else it'll blow up when trying to read it). So it's similarly trivial for the support code to say:
if(heapSize < fsize(imageFile) + headRoom) abort("Not enough memory to load image");
So, i thought that its better when interpreter controls the memory allocation size, not platform code alone.
The platform already completely controls the allocation. It's only that we allow the interpreter to make the first request and that request is based on what we've passed in as desiredHeapSize from the support code. Lots of useless complexity, the support code might as well just allocate it without involving the interpreter.
Cheers, - Andreas
Cheers,
- Andreas
No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own!
2010/1/20 Andreas Raab andreas.raab@gmx.de:
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
- "Install" is more appropriate that "read", since the contents of the
image file have already been read.
- "Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers,
- Andreas
2010/1/20 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
2010/1/20 Andreas Raab andreas.raab@gmx.de:
The platform code reads the allocates memory and reads the image. It can do this by reading a file or (in the Android case) by doing a series of JNI calls that pass uncompressed portions of a chunked image file from the app package (yes, it is complicated on Android). Or the support code could read it from a socket connection. Whatever. There is no reason for the interpreter to be involved *at all* in reading the image file.
Well, the current code implies some logic when decoding a header, like allocating enough space + breathing room for object memory.
That's a problem the support code already deals with. The support code already knows the limits on memory that the user specified (via -memory argument or implicitly via mmap() or similar means) and it also already needs to check that the image can actually fit into the heap (or else it'll blow up when trying to read it). So it's similarly trivial for the support code to say:
if(heapSize < fsize(imageFile) + headRoom) abort("Not enough memory to load image");
So, i thought that its better when interpreter controls the memory allocation size, not platform code alone.
The platform already completely controls the allocation. It's only that we allow the interpreter to make the first request and that request is based on what we've passed in as desiredHeapSize from the support code. Lots of useless complexity, the support code might as well just allocate it without involving the interpreter.
Support code may not know the image size beforehead. Or image file could be combined with something else into a single file , which makes it much larger than actually need for image.
You'll never know how much space you need before loading a header and determining the object memory size. The desizered heap size could be actually a maximum heap size available to allocate but not optimal nor minimal. That's why i think its worth to leave allocation at VM side, not in support code, as long as you leaving header reading there.
Cheers, - Andreas
Cheers, - Andreas
No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own!
2010/1/20 Andreas Raab andreas.raab@gmx.de:
David T. Lewis wrote:
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: > > What I'm proposing is an additional VM entry point, called > > Interpreter>>readImageFromHeap: heapStart size: heapSize > > which takes to arguments: The pointer to the beginning of the > (pre-allocated) memory portion where the image file has been loaded > or > mapped and a size for the allocated memory portion. The VM then > assumes > that beginning at heapStart there is an image provided that it needs > to > properly prepare (i.e., perform byte-reversal, pointer adjustment > etc).
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
Rationale:
- "Install" is more appropriate that "read", since the contents of the
image file have already been read.
- "Heap" is not relevant.
Fair enough. I don't really care what it's called.
- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64.
See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that.
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
Cheers, - Andreas
Igor Stasenko wrote:
The platform already completely controls the allocation. It's only that we allow the interpreter to make the first request and that request is based on what we've passed in as desiredHeapSize from the support code. Lots of useless complexity, the support code might as well just allocate it without involving the interpreter.
Support code may not know the image size beforehead.
Yes it does. The support code does the reading after all. Since an entire image file needs to be read before it can be interpreted, there is no way that the support code wouldn't know its size. Even if you ship it over the wire (say in a http request) all protocols that I'm aware about have information that encode the size of its payload (Content-Length in http). The fact that the dataLength is also part of the header should be seen as a duplication of information that can be used by the interpreter to verify the integrity of the image. It doesn't mean that this will be the only source of information about the size of the image.
Or image file could be combined with something else into a single file , which makes it much larger than actually need for image.
If you need that, you would implement the means to recognize and reconcile the situation appropriately. For example, on Android I'm not loading the entire package into memory. I'm extracting portions of the zip file and load them into proper locations. I'm doing this because I'm in a non-standard situation that requires adjusting for its tradeoffs.
However, we know that the "standard situation" is one in which loading the entire image file is entirely appropriate. Moreoever, we know that regardless of the method of I/O we're using we always need to produce a complete image file first. Since we know that, and since the method of I/O will differ depending on the context in which your VM runs in, it is not meaningful to worry about situations that you describe - nobody would load more than they'd have to unless that's considered a reasonable tradeoff (there may be situations where this is actually advantageous).
There is really nothing to worry about here. The standard situation is that we can read an entire image file. The non-standard situations have to be dealt with on a case-by-case basis but in any case they'll have a means of identifying the image, loading into memory and providing it to the interpreter. There is no reason to worry about someone writing the support code just randomly passing excess bytes to the interpreter.
Cheers, - Andreas
2010/1/20 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
The platform already completely controls the allocation. It's only that we allow the interpreter to make the first request and that request is based on what we've passed in as desiredHeapSize from the support code. Lots of useless complexity, the support code might as well just allocate it without involving the interpreter.
Support code may not know the image size beforehead.
Yes it does. The support code does the reading after all. Since an entire image file needs to be read before it can be interpreted, there is no way that the support code wouldn't know its size. Even if you ship it over the wire (say in a http request) all protocols that I'm aware about have information that encode the size of its payload (Content-Length in http). The fact that the dataLength is also part of the header should be seen as a duplication of information that can be used by the interpreter to verify the integrity of the image. It doesn't mean that this will be the only source of information about the size of the image.
Or image file could be combined with something else into a single file , which makes it much larger than actually need for image.
If you need that, you would implement the means to recognize and reconcile the situation appropriately. For example, on Android I'm not loading the entire package into memory. I'm extracting portions of the zip file and load them into proper locations. I'm doing this because I'm in a non-standard situation that requires adjusting for its tradeoffs.
However, we know that the "standard situation" is one in which loading the entire image file is entirely appropriate. Moreoever, we know that regardless of the method of I/O we're using we always need to produce a complete image file first. Since we know that, and since the method of I/O will differ depending on the context in which your VM runs in, it is not meaningful to worry about situations that you describe - nobody would load more than they'd have to unless that's considered a reasonable tradeoff (there may be situations where this is actually advantageous).
There is really nothing to worry about here. The standard situation is that we can read an entire image file. The non-standard situations have to be dealt with on a case-by-case basis but in any case they'll have a means of identifying the image, loading into memory and providing it to the interpreter. There is no reason to worry about someone writing the support code just randomly passing excess bytes to the interpreter.
Well, i mainly meant some kind of a network startup scenario, when image data is feeded directly from socket. Platform code could pass the socket to interpreter (as void* object), and then wait for allocation & read requests which will follow. And so, you don't have to use some intermediate memory buffer to preload an image and only then pass it to interpreter entry point.
Cheers, - Andreas
Igor Stasenko wrote:
Well, i mainly meant some kind of a network startup scenario, when image data is feeded directly from socket. Platform code could pass the socket to interpreter (as void* object), and then wait for allocation & read requests which will follow. And so, you don't have to use some intermediate memory buffer to preload an image and only then pass it to interpreter entry point.
But Igor, unless you are *trying* to be obscure why would your network protocol (that the support code needs to implement anyway) not include any information about how many bytes are to come? See my previous example about http - you could trivially launch an image by giving it an HTTP url; download the image into memory and launch the interpreter without having to involve the interpreter at all.
This is *necessarily* true for *all* I/O operations that load an image. There simply is no point to pass a void* to the interpreter only have it passed back as a void* to the support code. If you're doing that, why not load the thing right away instead of unnecessarily passing pointers back and forth?
Cheers, - Andreas
2010/1/20 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
Well, i mainly meant some kind of a network startup scenario, when image data is feeded directly from socket. Platform code could pass the socket to interpreter (as void* object), and then wait for allocation & read requests which will follow. And so, you don't have to use some intermediate memory buffer to preload an image and only then pass it to interpreter entry point.
But Igor, unless you are *trying* to be obscure why would your network protocol (that the support code needs to implement anyway) not include any information about how many bytes are to come? See my previous example about http - you could trivially launch an image by giving it an HTTP url; download the image into memory and launch the interpreter without having to involve the interpreter at all.
This is *necessarily* true for *all* I/O operations that load an image. There simply is no point to pass a void* to the interpreter only have it passed back as a void* to the support code. If you're doing that, why not load the thing right away instead of unnecessarily passing pointers back and forth?
because of image header parsing logic, which ,as you said should stay in interpreter. Why one should wait , downloading the whole image, only after that to discover that image format is not compatible with interpreter, or not enough memory etc etc?
Cheers, - Andreas
Igor Stasenko wrote:
Why one should wait , downloading the whole image, only after that to discover that image format is not compatible with interpreter, or not enough memory etc etc?
Because your use case is purely hypothetical. You are talking about this as if it was the most common thing to run incompatible images over slow network connections. But it's not. The use case you're describing simply doesn't exist and won't for any foreseeable time.
I'm more than willing to revisit this issue once we have a *some* evidence that this is an actual issue but up until that point let's avoid solving problems we don't actually have.
Cheers, - Andreas
2010/1/21 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
Why one should wait , downloading the whole image, only after that to discover that image format is not compatible with interpreter, or not enough memory etc etc?
Because your use case is purely hypothetical. You are talking about this as if it was the most common thing to run incompatible images over slow network connections. But it's not. The use case you're describing simply doesn't exist and won't for any foreseeable time.
I'm more than willing to revisit this issue once we have a *some* evidence that this is an actual issue but up until that point let's avoid solving problems we don't actually have.
Reading a 10Mb file from flash takes more that 1 second. Reading 64bytes(header) from flash stick, takes much less time. Same for networked mounted drive etc etc. Take in account a failures, like broken image file, or supplying wrong file etc etc. While all of it is obscure cases, it should be handled effectively.
Also, i'm not a fan of burning CPU cycles & system resources in advance, if there's a possibility for shortcut. I am sympathizing a green-life movement ;)
Cheers, - Andreas
Yes ram, Igor see the copy of the pdf I sent you. I found that paging in 6 more MB takes 3 or so seconds. I don't have an exact number but the other thing that happens is that on the iphone things like the browser and email, phone services, itunes might get terminated in order to find free ram pages for the app to use.
It's very much like the 70's where pages are not free and paging in a page usually means a page has to go out. Pages that can go out are determined by the VM. On the iphone this is code, mmap files that are private read/write to writable space.
Technically I could for example copy the image from read/only directory to temp read/write, then mmap as private read/write then the file node Virtual memory manager would technically allow page in/out of the file. However the cost for doing this at startup is expensive since you have to make a copy of the read/only file. You can' t use a read/write image because at any point the file won't reflect the proper state since pages will be in memory as read, or as read/altered but not written to store. And aborting the app would mess things up. PS Not sure if you could flush the file on the the terminate. Obviously one could experiment and see.
On 2010-01-20, at 2:28 PM, Igor Stasenko wrote:
2010/1/21 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
Why one should wait , downloading the whole image, only after that to discover that image format is not compatible with interpreter, or not enough memory etc etc?
Because your use case is purely hypothetical. You are talking about this as if it was the most common thing to run incompatible images over slow network connections. But it's not. The use case you're describing simply doesn't exist and won't for any foreseeable time.
I'm more than willing to revisit this issue once we have a *some* evidence that this is an actual issue but up until that point let's avoid solving problems we don't actually have.
Reading a 10Mb file from flash takes more that 1 second. Reading 64bytes(header) from flash stick, takes much less time. Same for networked mounted drive etc etc. Take in account a failures, like broken image file, or supplying wrong file etc etc. While all of it is obscure cases, it should be handled effectively.
Also, i'm not a fan of burning CPU cycles & system resources in advance, if there's a possibility for shortcut. I am sympathizing a green-life movement ;)
Cheers,
- Andreas
-- Best regards, Igor Stasenko AKA sig.
-- =========================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
On 2010-01-19, at 6:04 PM, Igor Stasenko wrote:
2010/1/20 Andreas Raab andreas.raab@gmx.de:
Igor Stasenko wrote:
Platform code invoking:
sqReadImageFrom(void *object).
Ok, the other thing that goes on here is that I mmap the start of the bytecodes to 500MB on os-x and the iPhone
usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize)
the headersize that's passed in helps me to adjust where the bytecode start is
This is done so that I can avoid swizzling all pointers between restarts on the machines, on os-x and the iPhone byte codes start at 500MB...
Note this becomes more interesting if you can avoid swizzling all the pointers at startup time on FLASH based devices so you don't load the entire image into RAM. In my iPhone work for wikiserver the image is 1932 pages, but I only need to load 806 pages into ram to get the first page up. {Yes of course I had to remove the crappy allInstances in various places at startup time}. If you check the pointer swizzle doesn't do anything if the old start of memory offset matchs the new start of memory offset.
BTW Loading the extra 1100 pages actually *is* measured in seconds, so the gain is important.
-- =========================================================================== John M. McIntosh johnmci@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ===========================================================================
On Tue, Jan 19, 2010 at 05:34:25PM -0800, Andreas Raab wrote:
David T. Lewis wrote:
A suggestion:
Interpreter>>installImage: address size: imageSize header: header
I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again?
OK
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
I don't mean to debate it ;) My point is that it can be handled later when everyone has agreed and the necessary action has been taken to implement it. This may or may not happen quickly; meanwhile adding the new entry point is straightforward.
Dave
David T. Lewis wrote:
My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time.
Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one.
I don't mean to debate it ;) My point is that it can be handled later when everyone has agreed and the necessary action has been taken to implement it. This may or may not happen quickly; meanwhile adding the new entry point is straightforward.
Ah. Agreed :-)
Cheers, - Andreas
This proposal is tracked on Mantis for follow up: http://bugs.squeak.org/view.php?id=7459
Dave
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
Apologies, accidentally posted to squeak-dev instead of vm-dev.
- A.
-------- Original Message -------- Subject: Image file loading Date: Tue, 19 Jan 2010 14:06:00 -0800 From: Andreas Raab andreas.raab@gmx.de Reply-To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Newsgroups: gmane.comp.lang.smalltalk.squeak.general
Folks -
I think that one of the more clever hacks I did for the Android VM deserves proper generalization in the Interpreter. Since on Android the image is part of the (compressed) application package it's not easily possible to pass a file handle into the VM and have the VM read the image as an external file.
Instead, I'm preloading the image file into memory and then hacked an implementation of the sqImageFile* and sqAlloc* functions that would basically operate on the image in memory. I think we should formalize this idea for two reasons:
a) It allows preloading the image into memory on platforms where the image may not be in an external file.
b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone).
What I'm proposing is an additional VM entry point, called
Interpreter>>readImageFromHeap: heapStart size: heapSize
which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
The VMs would generally use this along the lines of:
memStart = self malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize);
and of course a backwards compatible version of readImageFromFile:HeapSize:StartingAt: could do the same (although I'd rather drop that function altogether since it removes the need for supporting various sqImageFile* functions).
I'd also say that the same pattern should be used for *writing* image files, that is we assemble the entire image in memory then call sqImageFileWrite(fileName, heapStart, heapSize) and sqImageFileWrite is really just a combo of fopen(); fwrite(); fclose();
I think this scheme is much nicer, less complicated and offers more options for implementing image loading than what we currently have.
Comments?
Cheers,
- Andreas
Andreas,
have you looked at what the SqueakNOS guys did? I haven't had time to look at your proposal in detail, yet, but have a feeling that there might be some overlap here.
-- Jecel
vm-dev@lists.squeakfoundation.org