PRIM_IO signature
signature PRIM_IO
structure BinPrimIO :> PRIM_IO
where type array = Word8Array.array
where type vector = Word8Vector.vector
where type elem = Word8.word
where type pos = Position.int
structure TextPrimIO :> PRIM_IO
where type array = CharArray.array
where type vector = CharVector.vector
where type elem = Char.char
structure WideTextPrimIO :> PRIM_IO (* OPTIONAL *)
where type array = WideCharArray.array
where type vector = WideCharVector.vector
where type elem = WideChar.char
The PRIM_IO signature is an abstraction of the fundamental system call operations commonly available on open file descriptors OS.IO.iodesc. Imperative and Stream I/O facilities do not access the operating system directly, but access the appropriate Primitive I/O reader and writer that accomplishes the required system call.
Several operations in the PRIM_IO interface will raise exceptions that have been left intentionally unspecified. The actual exception raised will usually be operating system-dependent, but may vary. For example, a reader connected to a prime number generator might raise Overflow. More typically, the close operation on a reader or writer may cause an exception to be raised if there is a failure in the underlying file system, such as the disk being full or the file server being unavailable. In addition, one would expect readVec and readVecNB to raise Size if the resulting vector would exceed the maximum allowed vector size or if the input parameter is negative. Similarly, one would expect readArr, readArrNB, writeArr, writeArrNB, writeVec, and writeVecNB to raise Subscript if array bounds are violated. Readers and writers should not, in general, raise the IO.Io exception. It is assumed that the higher levels will catch the exceptions raised at the Primitive I/O level, and appropriately construct and raise the IO.Io exception.
A reader is required to raise IO.Io if any of its functions, except close or getPos, is invoked after a call to close. A writer is required to raise IO.Io if any of its functions, except close, is invoked after a call to close. In both cases, the cause field of the exception should be IO.ClosedStream.
type elem
type vector
type vector_slice
type array
type array_slice
eqtype pos
val compare : pos * pos -> order
datatype reader
= RD of {
name : string,
chunkSize : int,
readVec : (int -> vector) option,
readArr : (array_slice -> int) option,
readVecNB : (int -> vector option) option,
readArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canInput : (unit -> bool) option,
avail : unit -> int option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
datatype writer
= WR of {
name : string,
chunkSize : int,
writeVec : (vector_slice -> int) option,
writeArr : (array_slice -> int) option,
writeVecNB : (vector_slice -> int option) option,
writeArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canOutput : (unit -> bool) option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
val openVector : vector -> reader
val nullRd : unit -> reader
val nullWr : unit -> writer
val augmentReader : reader -> reader
val augmentWriter : writer -> writer
type elem
elem type is an abstraction that represents the ``element'' of a file (or device, etc.). Typically, elements are either characters (char) or bytes (Word8.word).
type vector
type vector_slice
type array
type array_slice
vector type is an immutable vector of elements; the vector_slice type is a slice of a vector; the array type is an mutable array of elements; and the array_slice type is a slice of an array.
eqtype pos
compare (pos, pos')
LESS, EQUAL, or GREATER when pos is less than, equal to, or greater than pos', respectively, in some underlying linear ordering on pos values.
datatype reader
= RD of {
name : string,
chunkSize : int,
readVec : (int -> vector) option,
readArr : (array_slice -> int) option,
readVecNB : (int -> vector option) option,
readArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canInput : (unit -> bool) option,
avail : unit -> int option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
reader is an abstraction for a source of items of type elem. Usually, it will correspond to a file or device opened for reading. It can also represent the output of some algorithm or function, not necessarily connected to the outside world, that produces elements. The resulting sequence of elements is potentially unbounded. In the description below, we will usually refer to the limit sequence as a ``file,'' as this is the most common instance.
name
reader, used in error messages shown to the user.
chunkSize
reader. This is typically set to the block size of the operating system's buffers. chunkSize = 1 strongly recommends unbuffered reads, but since buffering is handled at a higher level, it cannot guarantee it. chunkSize <= 0 is illegal.
readVec(n)
0). It blocks, if necessary, until end-of-stream is detected or at least one element is available.
It is recommended that implementations of this function raise the Size exception if n < 0.
readArr(slice)
0 (this function also returns 0 when slice is empty). It blocks, if necessary, until at least one element is available.
readVecNB(n)
SOME(v); or if end-of-stream is detected without blocking, returns SOME(fromList[]); or if a read would block, returns NONE.
readArrNB(slice)
NONE, otherwise it returns SOME(n), where n is the number of elements actually read (0 on end-of-stream)
block()
canInput()
true if and only if the next read can proceed without blocking.
avail()
NONE if it cannot be determined. For files or strings, this is the file or string size minus the current position; for most other input sources, this is probably NONE. This can be used as a hint by inputAll. Note that this is a byte count, not an element count.
getPos()
getPos function must be non-decreasing (in the absence of setPos operations or other interference on the underlying object).
setPos(i)
endPos()
verifyPos()
getPos, except that the latter may maintain its own notion of file position for efficiency, whereas verifyPos will typically perform a system call to obtain the underlying operating system's value of the file position.
close
reader closed and, if necessary, performs any cleanup and releases any operating system resources. Further operations on the reader (besides close and getPos) raise IO.ClosedStream.
ioDesc
readVec, readVecNB, readArr, or readArrNB must be provided. Providing more of the optional functions increases functionality and/or efficiency of clients.
readVec, readArr, and block means that blocking input is not possible.
readVecNB, readArrNB, and canInput means that non-blocking input is not possible.
readVecNB means that non-blocking input requires two system calls (using canInput and readVec).
readArr or readArrNB means that input into an array requires extra copying. Note that the ``lazy functional stream'' model does not use arrays at all.
setPos prevents random access.
avail return a value helps the client perform very large input more efficiently, with one system call and no copying.
If the reader can provide more than the minimum set of operations in a way that is more efficient then the obvious synthesis (see augmentReader), then by all means it should do so. Providing more than the minimum by just doing the obvious synthesis inside the Primitive I/O layer is not recommended because then clients will not get the ``hint'' about which are the efficient (``recommended'') operations. Clients concerned with efficiency will make use of the operations provided natively, and may need to choose algorithms depending on which operations are provided; clients not concerned with efficiency or requiring certain operations can use the reader constructed by augmentReader.
datatype writer
= WR of {
name : string,
chunkSize : int,
writeVec : (vector_slice -> int) option,
writeArr : (array_slice -> int) option,
writeVecNB : (vector_slice -> int option) option,
writeArrNB : (array_slice -> int option) option,
block : (unit -> unit) option,
canOutput : (unit -> bool) option,
getPos : (unit -> pos) option,
setPos : (pos -> unit) option,
endPos : (unit -> pos) option,
verifyPos : (unit -> pos) option,
close : unit -> unit,
ioDesc : OS.IO.iodesc option
}
writer is a file (device, etc.) opened for writing. A writer is an abstraction for a store of items of type elem. Usually, it will correspond to a file or device opened for writing. It can also represent input to some algorithm or function, not necessarily connected to the outside world, which consumes the output to guide its computations. The resulting store of elements is potentially unbounded. In the description below, we will usually refer to the store as a ``file,'' as this is the most common instance.
name
chunkSize
writer. This is typically set to the block size of the operating system's buffers. chunkSize = 1 strongly recommends unbuffered writes, but since buffering is handled at a higher level, it cannot guarantee it. chunkSize <= 0 is illegal.
writeVec(slice)
writeArr(slice)
writeVecNB(slice)
SOME(n), where n is the number of elements actually written. Otherwise, if it would block then it returns NONE without blocking.
writeVecNB(slice)
SOME(n), where n is the number of elements actually written. Otherwise, if it would block then it returns NONE without blocking.
block()
writer is guaranteed to be able to write without blocking.
canOutput()
true if and only if the next write can proceed without blocking.
getPos()
endPos()
setPos(i)
verifyPos()
getPos, except that the latter may maintain its own notion of file position for efficiency, whereas verifyPos will typically perform a system call to obtain the underlying operating system's value of the file position.
close()
writer closed and, if necessary, performs any cleanup and releases any operating system resources. Further operations (other than close) raise IO.ClosedStream.
ioDesc
The write operations return the number of full elements that have been written. If the size of an element is greater than 1 byte, it is possible that an additional part of an element might be written. For example, if one tries to write 2 elements, each of size 3 bytes, the underlying system write operation may report that only 4 of the 6 bytes has been written. Thus, one full element has been written, plus part of the second, so the write operation would return 1.
One of writeVec, writeVecNB, writeArr, or writeArrNB must be provided. Providing more of the optional functions increases functionality and/or efficiency of clients.
writeVec, writeArr, and block means that blocking output is not possible.
writeVecNB, writeArrNB, and canOutput means that non-blocking output is not possible.
writeArr or writeArrNB means that extra copying will be required to write from an array.
setPos prevents random access.
openVector v
val nullRd : unit -> reader
val nullWr : unit -> writer
nullRd acts like a reader that is always at end-of-stream. The writer nullWr serves as a sink; any data written using it is thrown away. Null readers and writers can be closed; if closed, they are expected to behave the same as any other closed reader or writer.
augmentReader rd
readVec, readArr, readVecNB, and readArrNB are provided, by synthesizing these from the operations of rd.
For example, augmentReader can synthesize readVec from readVecNB and block, synthesize vector reads from array reads, synthesize array reads from vector reads, as needed. The following table indicates how each synthesis can be accomplished.
| Synthesize: | From: |
|---|---|
readVec
|
readVec or readArr or (block and (readVecNB or readArrNB))
|
readArr
|
readArr or readVec or (block and (readArrNB or readVecNB))
|
readVecNB
|
readVecNB or readArrNB or (canInput and (readVec or readArr))
|
readArrNB
|
readArrNB or readVecNB or (canInput and (readArr or readVec))
|
augmentReader should do no harm: if a reader rd supplies some operation (such as readArr), then augmentReader(rd) provides the same implementation of that operation, not a synthesized one.
augmentWriter wr
writeVec, writeArr, writeVecNB, and writeArrNB are provided, by synthesizing these from the operations of wr.
The following table indicates how each synthesis can be accomplished.
| Synthesize: | From: |
|---|---|
writeVec
|
writeVec or writeArr or (block and (writeVecNB or writeArrNB))
|
writeArr
|
writeArr or writeVec or (block and (writeArrNB or writeVecNB))
|
writeVecNB
|
writeVecNB or writeArrNB or (canOutput and (writeVec or writeArr))
|
writeArrNB
|
writeArrNB or writeVecNB or (canOutput and (writeArr or writeVec))
|
BinIO,IMPERATIVE_IO,OS.IO,Posix.IO,PrimIO,STREAM_IO,TextIO
It may not be possible to use augmentReader or augmentWriter to synthesize operations in a way that is thread-safe in concurrent systems.
None of the function components in readers and writers should block, except for the obvious ones: readVec, readArray, writeVec, writeArray, and block.
The end-of-stream condition at the Stream I/O level is an artifact of a read operation at the Primitive I/O level returning 0 elements. Although this event is maintained in the stream, it is a transient condition at the Primitive I/O level. If a call to readVec returns an empty vector one time, it is quite possible that another call to readVec at the same file position will return elements. The value returned by an endPos function should indicate the position after the last element in the file at the time the function is called.
Implementation note:
The functions
getPos,setPos,endPos, andverifyPosare used to support arbitrary random access on the underlying I/O file or device. On most systems, this is well-supported only on static objects such as strings or regular files. Typical implementations will therefore setgetPosand the rest toNONEin both readers and writers for all other file types.In an implementation where an input stream is represented as a chain of buffers, each buffer may contain, along with its vector of data, a
posvalue indicating where the data came from in the underlying file. As each buffer corresponds to a read operation, the client is likely to callgetPoson each read operation. Thus, the reader should, if possible, maintain its own version of the file position, to be returned bygetPos, to avoid extra system calls.Unlike readers, which can expect their
getPosfunctions to be called frequently, writers need not implementgetPosin a highly efficient manner: a system call for eachgetPosis acceptable. Indeed, with a file opened for atomic append, the information cannot be obtained reliably except by a system call usingverifyPos.
Generated April 12, 2004
Last Modified July 1, 2002
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
|
Copyright © 2004 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |