C Programming/Stream IO
Introduction
[edit | edit source]The stdio.h
header declares a broad assortment of functions that perform input and output to files and devices such as the console. It was one of the earliest headers to appear in the C library. It declares more functions than any other standard header and also requires more explanation because of the complex machinery that underlies the functions.
The device-independent model of input and output has seen dramatic improvement over the years and has received little recognition for its success. FORTRAN II was touted as a machine-independent language in the 1960s, yet it was essentially impossible to move a FORTRAN program between architectures without some change. In FORTRAN II, you named the device you were talking to right in the FORTRAN statement in the middle of your FORTRAN code. So, you said READ INPUT TAPE 5
on a tape-oriented IBM 7090 but READ CARD
to read a card image on other machines. FORTRAN IV had more generic READ
and WRITE
statements, specifying a logical unit number (LUN) instead of the device name. The era of device-independent I/O had dawned.
Peripheral devices such as printers still had fairly strong notions about what they were asked to do. And then, peripheral interchange utilities were invented to handle bizarre devices. When cathode-ray tubes came onto the scene, each manufacturer of consoles solved problems such as console cursor movement in an independent manner, causing further headaches.
It was into this atmosphere that Unix was born. Ken Thompson and Dennis Ritchie, the developers of Unix, deserve credit for packing any number of bright ideas into the operating system. Their approach to device independence was one of the brightest.
The ANSI C <stdio.h>
library is based on the original Unix file I/O primitives but casts a wider net to accommodate the least-common denominator across varied systems.
Streams
[edit | edit source]Input and output, whether to or from physical devices such as terminals and tape drives, or whether to or from files supported on structured storage devices, are mapped into logical data streams, whose properties are more uniform than their various inputs and outputs. Two forms of mapping are supported: text streams and binary streams.
A text stream consists of one or more lines. A line in a text stream consists of zero or more characters plus a terminating new-line character. (The only exception is that in some implementations the last line of a file does not require a terminating new-line character.) Unix adopted a standard internal format for all text streams. Each line of text is terminated by a new-line character. That's what any program expects when it reads text, and that's what any program produces when it writes text. (This is the most basic convention, and if it doesn't meet the needs of a text-oriented peripheral attached to a Unix machine, then the fix-up occurs out at the edges of the system. Nothing in between needs to change.) The string of characters that go into, or come out of a text stream may have to be modified to conform to specific conventions. This results in a possible difference between the data that go into a text stream and the data that come out. For instance, in some implementations when a space-character precedes a new-line character in the input, the space character gets removed out of the output. In general, when the data only consists of printable characters and control characters like horizontal tab and new-line, the input and output of a text stream are equal.
Compared to a text stream, a binary stream is pretty straight forward. A binary stream is an ordered sequence of characters that can transparently record internal data. Data written to a binary stream shall always equal the data that gets read out under the same implementation. Binary streams, however, may have an implementation-defined number of null characters appended to the end of the stream. There are no further conventions which need to be considered.
Nothing in Unix prevents the program from writing arbitrary 8-bit binary codes to any open file, or reading them back unchanged from an adequate repository. Thus, Unix obliterated the long-standing distinction between text streams and binary streams.
Standard Streams
[edit | edit source]When a C program starts its execution the program automatically opens three standard streams named
stdin
, stdout
, and stderr
. These are attached for every C program.
The first standard stream is used for input buffering and the other two are used for output. These streams are sequences of bytes.
Consider the following program:
/* An example program. */
int main()
{
int var;
scanf ("%d", &var); /* use stdin for scanning an integer from keyboard. */
printf ("%d", var); /* use stdout for printing the integer that was just scanned in. */
return 0;
}
/* end program. */
By default stdin
points to the keyboard and stdout
and stderr
point to the screen. It is possible under Unix and may be possible under other operating systems to redirect input from or output to a file or both.
Pointers to streams
[edit | edit source]FILE
rather than stream
. The <stdio.h>
header contains a definition for a type FILE
(usually via a typedef
) which is capable of processing all the information needed to exercise control over a stream, including its file position indicator, a pointer to the associated buffer (if any), an error indicator that records whether a read/write error has occurred, and an end-of-file indicator that records whether the end of the file has been reached.
It is considered bad form to access the contents of FILE
directly unless the programmer is writing an implementation of <stdio.h>
and its contents. Better access to the contents of FILE
is provided via the functions in <stdio.h>
. It can be said that the FILE
type is an early example of object-oriented programming.
Opening and Closing Files
[edit | edit source]To open and close files, the <stdio.h>
library has three functions: fopen
, freopen
, and fclose
.
Opening Files
[edit | edit source] #include <stdio.h>
FILE *fopen(const char *filename, const char *mode);
FILE *freopen(const char *filename, const char *mode, FILE *stream);
fopen
and freopen
opens the file whose name is in the string pointed to by filename
and associates a stream with it. Both return a pointer to the object controlling the stream, or, if the open operation fails, a null pointer. The error and end-of-file indicators are cleared, and if the open operation fails error is set. freopen
differs from fopen
in that the file pointed to by stream
is closed first when already open and any close errors are ignored.
mode
for both functions points to a string beginning with one of the following sequences (additional characters may follow the sequences):
r open a text file for reading w truncate to zero length or create a text file for writing a append; open or create text file for writing at end-of-file rb open binary file for reading wb truncate to zero length or create a binary file for writing ab append; open or create binary file for writing at end-of-file r+ open text file for update (reading and writing) w+ truncate to zero length or create a text file for update a+ append; open or create text file for update r+b or rb+ open binary file for update (reading and writing) w+b or wb+ truncate to zero length or create a binary file for update a+b or ab+ append; open or create binary file for update
Opening a file with read mode ('r
' as the first character in the mode
argument) fails if the file does not exist or cannot be read.
Opening a file with append mode ('a
' as the first character in the mode
argument) causes all subsequent writes to the file to be forced to the then-current end-of-file, regardless of intervening calls to the fseek
function. In some implementations, opening a binary file with append mode ('b
' as the second or third character in the above list of mode
arguments) may initially position the file position indicator for the stream beyond the last data written, because of null character padding.
When a file is opened with update mode ('+
' as the second or third character in the above list of mode
argument values), both input and output may be performed on the associated stream. However, output may not be directly followed by input without an intervening call to the fflush
function or to a file positioning function (fseek
, fsetpos
, or rewind
), and input may not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.
When opened, a stream is fully buffered if and only if it can be determined not to refer to an interactive device.
Closing Files
[edit | edit source] #include <stdio.h>
int fclose(FILE *stream);
The fclose
function causes the stream pointed to by stream
to be flushed and the associated file to be closed. Any unwritten buffered data for the stream are delivered to the host environment to be written to the file; any unread buffered data are discarded. The stream is disassociated from the file. If the associated buffer was automatically allocated, it is deallocated. The function returns zero if the stream was successfully closed or EOF
if any errors were detected.
Stream buffering functions
[edit | edit source]The fflush
function
[edit | edit source] #include <stdio.h>
int fflush(FILE *stream);
If stream
points to an output stream or an update stream in which the most recent operation was not input, the fflush
function causes any unwritten data for that stream to be deferred to the host environment to be written to the file. The behavior of fflush is undefined for input stream.
If stream
is a null pointer, the fflush
function performs this flushing action on all streams for which the behavior is defined above.
The fflush
functions returns EOF
if a write error occurs, otherwise zero.
The reason for having a fflush
function is because streams in C can have buffered input/output; that is, functions that write to a file actually write to a buffer inside the FILE
structure. If the buffer is filled to capacity, the write functions will call fflush
to actually "write" the data that is in the buffer to the file. Because fflush
is only called every once in a while, calls to the operating system to do a raw write are minimized.
The setbuf
function
[edit | edit source] #include <stdio.h>
void setbuf(FILE *stream, char *buf);
Except that it returns no value, the setbuf
function is equivalent to the setvbuf
function invoked with the values _IOFBF
for mode
and BUFSIZ
for size
, or (if buf
is a null pointer) with the value _IONBF
for mode
.
The setvbuf
function
[edit | edit source] #include <stdio.h>
int setvbuf(FILE *stream, char *buf, int mode, size_t size);
The setvbuf
function may be used only after the stream pointed to by stream
has been associated with an open file and before any other operation is performed on the stream. The argument mode
determines how the stream will be buffered, as follows: _IOFBF
causes input/output to be fully buffered; _IOLBF
causes input/output to be line buffered; _IONBF
causes input/output to be unbuffered. If buf
is not a null pointer, the array it points to may be used instead of a buffer associated by the setvbuf
function. (The buffer must have a lifetime at least as great as the open stream, so the stream should be closed before a buffer that has automatic storage duration is deallocated upon block exit.) The argument size
specifies the size of the array. The contents of the array at any time are indeterminate.
The setvbuf
function returns zero on success, or nonzero if an invalid value is given for mode
or if the request cannot be honored.
Functions that Modify the File Position Indicator
[edit | edit source]The stdio.h
library has five functions that affect the file position indicator besides those that do reading or writing: fgetpos
, fseek
, fsetpos
, ftell
, and rewind
.
The fseek
and ftell
functions are older than fgetpos
and fsetpos
.
The fgetpos
and fsetpos
functions
[edit | edit source] #include <stdio.h>
int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, const fpos_t *pos);
The fgetpos
function stores the current value of the file position indicator for the stream pointed to by stream
in the object pointed to by pos
. The value stored contains unspecified information usable by the fsetpos
function for repositioning the stream to its position at the time of the call to the fgetpos
function.
If successful, the fgetpos
function returns zero; on failure, the fgetpos
function returns nonzero and stores an implementation-defined positive value in errno
.
The fsetpos
function sets the file position indicator for the stream pointed to by stream
according to the value of the object pointed to by pos
, which shall be a value obtained from an earlier call to the fgetpos
function on the same stream.
A successful call to the fsetpos
function clears the end-of-file indicator for the stream and undoes any effects of the ungetc
function on the same stream. After an fsetpos
call, the next operation on an update stream may be either input or output.
If successful, the fsetpos
function returns zero; on failure, the fsetpos
function returns nonzero and stores an implementation-defined positive value in errno
.
The fseek
and ftell
functions
[edit | edit source] #include <stdio.h>
int fseek(FILE *stream, long int offset, int whence);
long int ftell(FILE *stream);
The fseek
function sets the file position indicator for the stream pointed to by stream
.
For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset
to the position specified by whence
. Three macros in stdio.h
called SEEK_SET
, SEEK_CUR
, and SEEK_END
expand to unique values. If the position specified by whence
is SEEK_SET
, the specified position is the beginning of the file; if whence
is SEEK_END
, the specified position is the end of the file; and if whence
is SEEK_CUR
, the specified position is the current file position. A binary stream need not meaningfully support fseek
calls with a whence
value of SEEK_END
.
For a text stream, either offset
shall be zero, or offset
shall be a value returned by an earlier call to the ftell
function on the same stream and whence
shall be SEEK_SET
.
The fseek
function returns nonzero only for a request that cannot be satisfied.
The ftell
function obtains the current value of the file position indicator for the stream pointed to by stream
. For a binary stream, the value is the number of characters from the beginning of the file; for a text stream, its file position indicator contains unspecified information, usable by the fseek
function for returning the file position indicator for the stream to its position at the time of the ftell
call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
If successful, the ftell
function returns the current value of the file position indicator for the stream. On failure, the ftell
function returns -1L
and stores an implementation-defined positive value in errno
.
The rewind
function
[edit | edit source] #include <stdio.h>
void rewind(FILE *stream);
The rewind
function sets the file position indicator for the stream pointed to by stream
to the beginning of the file. It is equivalent to
(void)fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream is also cleared.
Error Handling Functions
[edit | edit source]The clearerr
function
[edit | edit source] #include <stdio.h>
void clearerr(FILE *stream);
The clearerr
function clears the end-of-file and error indicators for the stream pointed to by stream
.
The feof
function
[edit | edit source] #include <stdio.h>
int feof(FILE *stream);
The feof
function tests the end-of-file indicator for the stream pointed to by stream
and returns nonzero if and only if the end-of-file indicator is set for stream
, otherwise it returns zero.
The ferror
function
[edit | edit source] #include <stdio.h>
int ferror(FILE *stream);
The ferror
function tests the error indicator for the stream pointed to by stream
and returns nonzero if and only if the error indicator is set for stream
, otherwise it returns zero.
The perror
function
[edit | edit source] #include <stdio.h>
void perror(const char *s);
The perror
function maps the error number in the integer expression errno
to an error message. It writes a sequence of characters to the standard error stream thus: first, if s
is not a null pointer and the character pointed to by s
is not the null character, the string pointed to by s
followed by a colon (:) and a space; then an appropriate error message string followed by a new-line character. The contents of the error message are the same as those returned by the strerror
function with the argument errno
, which are implementation-defined.
Other Operations on Files
[edit | edit source]The stdio.h
library has a variety of functions that do some operation on files besides reading and writing.
The remove
function
[edit | edit source] #include <stdio.h>
int remove(const char *filename);
The remove
function causes the file whose name is the string pointed to by filename
to be no longer accessible by that name. A subsequent attempt to open that file using that name will fail, unless it is created anew. If the file is open, the behavior of the remove
function is implementation-defined.
The remove
function returns zero if the operation succeeds, nonzero if it fails.
The rename
function
[edit | edit source] #include <stdio.h>
int rename(const char *old_filename, const char *new_filename);
The rename
function causes the file whose name is the string pointed to by old_filename
to be henceforth known by the name given by the string pointed to by new_filename
. The file named old_filename
is no longer accessible by that name. If a file named by the string pointed to by new_filename
exists prior to the call to the rename
function, the behavior is implementation-defined.
The rename
function returns zero if the operation succeeds, nonzero if it fails, in which case if the file existed previously it is still known by its original name.
The tmpfile
function
[edit | edit source] #include <stdio.h>
FILE *tmpfile(void);
The tmpfile
function creates a temporary binary file that will automatically be removed when it is closed or at program termination. If the program terminates abnormally, whether an open temporary file is removed is implementation-defined. The file is opened for update with "wb+"
mode.
The tmpfile
function returns a pointer to the stream of the file that it created. If the file cannot be created, the tmpfile
function returns a null pointer.
The tmpnam
function
[edit | edit source] #include <stdio.h>
char *tmpnam(char *s);
The tmpnam
function generates a string that is a valid file name and that is not the name of an existing file.
The tmpnam
function generates a different string each time it is called, up to TMP_MAX
times. (TMP_MAX
is a macro defined in stdio.h
.) If it is called more than TMP_MAX
times, the behavior is implementation-defined.
The implementation shall behave as if no library function calls the tmpnam
function.
If the argument is a null pointer, the tmpnam
function leaves its result in an internal static object and returns a pointer to that object. Subsequent calls to the tmpnam
function may modify the same object. If the argument is not a null pointer, it is assumed to point to an array of at least L_tmpnam
characters (L_tmpnam
is another macro in stdio.h
); the tmpnam
function writes its result in that array and returns the argument as its value.
The value of the macro TMP_MAX
must be at least 25.
Reading from Files
[edit | edit source]Character Input Functions
[edit | edit source]The fgetc
function
[edit | edit source] #include <stdio.h>
int fgetc(FILE *stream);
The fgetc
function obtains the next character (if present) as an unsigned char
converted to an int
, from the stream pointed to by stream
, and advances the associated file position indicator for the stream (if defined).
The fgetc
function returns the next character from the stream pointed to by stream
. If the stream is at end-of-file or a read error occurs, fgetc
returns EOF
(EOF
is a negative value defined in <stdio.h>
, usually (-1)
). The routines feof
and ferror
must be used to distinguish between end-of-file and error. If an error occurs, the global variable errno
is set to indicate the error.
The fgets
function
[edit | edit source] #include <stdio.h>
char *fgets(char *s, int n, FILE *stream);
The fgets
function reads at most one less than the number of characters specified by n
from the stream pointed to by stream
into the array pointed to by s
. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
The fgets
function returns s
if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
Warning: Different operating systems may use different character sequences to represent the end-of-line sequence. For example, some filesystems use the terminator \r\n
in text files; fgets
may read those lines, removing the \n
but keeping the \r
as the last character of s
. This expurious character should be removed in the string s
before the string is used for anything (unless the programmer doesn't care about it). Unixes typically use \n
as its end-of-line sequence, MS-DOS and Windows uses \r\n
, and Mac OSes used \r
before OS X. Many compilers on operating systems other than Unix or Linux map newline sequences to \n
on input for text files; check your compiler's documentation to discover what it does in this situation.
/* An example program that reads from stdin and writes to stdout */
#include <stdio.h>
#define BUFFER_SIZE 100
int main(void)
{
char buffer[BUFFER_SIZE]; /* a read buffer */
while( fgets (buffer, BUFFER_SIZE, stdin) != NULL)
{
printf("%s",buffer);
}
return 0;
}
/* end program. */
The getc
function
[edit | edit source] #include <stdio.h>
int getc(FILE *stream);
The getc
function is equivalent to fgetc
, except that it may be implemented as a macro. If it is implemented as a macro, the stream
argument may be evaluated more than once, so the argument should never be an expression with side effects (i.e. have an assignment, increment, or decrement operators, or be a function call).
The getc
function returns the next character from the input stream pointed to by stream
. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getc
returns EOF
(EOF
is a negative value defined in <stdio.h>
, usually (-1)
). If a read error occurs, the error indicator for the stream is set and getc
returns EOF
.
The getchar
function
[edit | edit source] #include <stdio.h>
int getchar(void);
The getchar
function is equivalent to getc
with the argument stdin
.
The getchar
function returns the next character from the input stream pointed to by stdin
. If stdin
is at end-of-file, the end-of-file indicator for stdin
is set and getchar
returns EOF
(EOF
is a negative value defined in <stdio.h>
, usually (-1)
). If a read error occurs, the error indicator for stdin
is set and getchar
returns EOF
.
The gets
function
[edit | edit source] #include <stdio.h>
char *gets(char *s);
The gets
function reads characters from the input stream pointed to by stdin
into the array pointed to by s
until an end-of-file is encountered or a new-line character is read. Any new-line character is discarded, and a null character is written immediately after the last character read into the array.
The gets
function returns s
if successful. If the end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
This function and description is only included here for completeness. Most C programmers nowadays shy away from using gets
, as there is no way for the function to know how big the buffer is that the programmer wants to read into.
Commandment #5 of Henry Spencer's The Ten Commandments for C Programmers (Annotated Edition) reads
Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest foo someone someday shall type supercalifragilisticexpialidocious.
It mentions gets
in the annotation:
As demonstrated by the deeds of the Great Worm, a consequence of this commandment is that robust production software should never make use ofgets()
, for it is truly a tool of the Devil. Thy interfaces should always inform thy servants of the bounds of thy arrays, and servants who spurn such advice or quietly fail to follow it should be dispatched forthwith to the Land Of Rm, where they can do no further harm to thee.
Before the 2018 version of the C standard, the gets
function was deprecated. It is hoped that programmers would use the fgets
function instead.
The ungetc
function
[edit | edit source] #include <stdio.h>
int ungetc(int c, FILE *stream);
The ungetc
function pushes the character specified by c
(converted to an unsigned char
) back onto the input stream pointed to by stream. The pushed-back characters will be returned by subsequent reads on that stream in the reverse order of their pushing. A successful intervening call (with the stream pointed to by stream
) to a file-positioning function (fseek
, fsetpos
, or rewind
) discards any pushed-back characters for the stream. The external storage corresponding to the stream is unchanged.
One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without an intervening read or file positioning operation on that stream, the operation may fail.
If the value of c
equals that of the macro EOF
, the operation fails and the input stream is unchanged.
A successful call to the ungetc
function clears the end-of-file indicator for the stream. The value of the file position indicator for the stream after reading or discarding all pushed-back characters shall be the same as it was before the characters were pushed back. For a text stream, the value of its file-position indicator after a successful call to the ungetc
function is unspecified until all pushed-back characters are read or discarded. For a binary stream, its file position indicator is decremented by each successful call to the ungetc
function; if its value was zero before a call, it is indeterminate after the call.
The ungetc
function returns the character pushed back after conversion, or EOF
if the operation fails.
EOF pitfall
[edit | edit source]A mistake when using fgetc
, getc
, or getchar
is to assign the result to a variable of type char
before comparing it to EOF
. The following code fragments exhibit this mistake, and then show the correct approach (using type int):
Mistake | Correction |
---|---|
char c;
while ((c = getchar()) != EOF)
putchar(c);
|
int c;
while ((c = getchar()) != EOF)
putchar(c);
|
Consider a system in which the type char
is 8 bits wide, representing 256 different values. getchar
may return any of the 256 possible characters, and it also may return EOF
to indicate end-of-file, for a total of 257 different possible return values.
When getchar
's result is assigned to a char
, which can represent only 256 different values, there is necessarily some loss of information—when packing 257 items into 256 slots, there must be a collision. The EOF
value, when converted to char
, becomes indistinguishable from whichever one of the 256 characters shares its numerical value. If that character is found in the file, the above example may mistake it for an end-of-file indicator; or, just as bad, if type char
is unsigned, then because EOF
is negative, it can never be equal to any unsigned char
, so the above example will not terminate at end-of-file. It will loop forever, repeatedly printing the character which results from converting EOF
to char
.
However, this looping failure mode does not occur if the char definition is signed (C makes the signedness of the default char type implementation-dependent),[1] assuming the commonly used EOF
value of -1. However, the fundamental issue remains that if the EOF
value is defined outside of the range of the char
type, when assigned to a char
that value is sliced and will no longer match the full EOF
value necessary to exit the loop. On the other hand, if EOF
is within range of char
, this guarantees a collision between EOF
and a char value. Thus, regardless of how system types are defined, never use char
types when testing against EOF
.
On systems where int
and char
are the same size (i.e., systems incompatible with minimally the POSIX and C99 standards), even the "good" example will suffer from the indistinguishability of EOF
and some character's value. The proper way to handle this situation is to check feof
and ferror
after getchar
returns EOF
. If feof
indicates that end-of-file has not been reached, and ferror
indicates that no errors have occurred, then the EOF
returned by getchar
can be assumed to represent an actual character. These extra checks are rarely done, because most programmers assume that their code will never need to run on one of these "big char
" systems. Another way is to use a compile-time assertion to make sure that UINT_MAX > UCHAR_MAX
, which at least prevents a program with such an assumption from compiling in such a system.
Direct input function: the fread
function
[edit | edit source] #include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
The fread
function reads, into the array pointed to by ptr
, up to nmemb
elements whose size is specified by size
, from the stream pointed to by stream
. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.
The fread
function returns the number of elements successfully read, which may be less than nmemb
if a read error or end-of-file is encountered. If size
or nmemb
is zero, fread
returns zero and the contents of the array and the state of the stream remain unchanged.
Formatted input functions: the scanf
family of functions
[edit | edit source] #include <stdio.h>
int fscanf(FILE *stream, const char *format, ...);
int scanf(const char *format, ...);
int sscanf(const char *s, const char *format, ...);
The fscanf
function reads input from the stream pointed to by stream
, under control of the string pointed to by format
that specifies the admissible sequences and how they are to be converted for assignment, using subsequent arguments as pointers to the objects to receive converted input. If there are insufficient arguments for the format, the behavior is undefined. If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored.
The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format is composed of zero or more directives: one or more white-space characters; an ordinary multibyte character (neither % or a white-space character); or a conversion specification. Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
- An optional assignment-suppressing character *.
- An optional nonzero decimal integer that specifies the maximum field width.
- An optional h, l (ell) or L indicating the size of the receiving object. The conversion specifiers d, i, and n shall be preceded by h if the corresponding argument is a pointer to
short int
rather than a pointer toint
, or by l if it is a pointer tolong int
. Similarly, the conversion specifiers o, u, and x shall be preceded by h if the corresponding argument is a pointer tounsigned short int
rather thanunsigned int
, or by l if it is a pointer tounsigned long int
. Finally, the conversion specifiers e, f, and g shall be preceded by l if the corresponding argument is a pointer todouble
rather than a pointer tofloat
, or by L if it is a pointer tolong double
. If an h, l, or L appears with any other format specifier, the behavior is undefined. - A character that specifies the type of conversion to be applied. The valid conversion specifiers are described below.
The fscanf
function executes each directive of the format in turn. If a directive fails, as detailed below, the fscanf
function returns. Failures are described as input failures (due to the unavailability of input characters) or matching failures (due to inappropriate input).
A directive composed of white-space character(s) is executed by reading input up to the first non-white-space character (which remains unread) or until no more characters remain unread.
A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If one of the characters differs from one comprising the directive, the directive fails, and the differing and subsequent characters remain unread.
A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace
function) are skipped, unless the specification includes a [, c, or n specifier. (The white-space characters are not counted against the specified field width.)
An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest matching sequences of input characters, unless that exceeds a specified field width, in which case it is the initial subsequence of that length in the sequence. The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure, unless an error prevented input from the stream, in which case it is an input failure.
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails; this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format
argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the space provided, the behavior is undefined.
The following conversion specifiers are valid:
- d
- Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the
strtol
function with the value 10 for thebase
argument. The corresponding argument shall be a pointer to integer.
- i
- Matches an optionally signed integer, whose format is the same as expected for the subject sequence of the
strtol
function with the value 0 for thebase
argument. The corresponding argument shall be a pointer to integer.
- o
- Matches an optionally signed octal integer, whose format is the same as expected for the subject sequence of the
strtoul
function with the value 8 for thebase
argument. The corresponding argument shall be a pointer to unsigned integer.
- u
- Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the
strtoul
function with the value 10 for thebase
argument. The corresponding argument shall be a pointer to unsigned integer.
- x
- Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the
strtoul
function with the value 16 for thebase
argument. The corresponding argument shall be a pointer to unsigned integer.
- e, f, g
- Matches an optionally signed floating-point number, whose format is the same as expected for the subject string of the
strtod
function. The corresponding argument will be a pointer to floating.
- s
- Matches a sequence of non-white-space characters. (No special provisions are made for multibyte characters.) The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically.
- [
- Matches a nonempty sequence of characters (no special provisions are made for multibyte characters) from a set of expected characters (the scanset). The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically. The conversion specifier includes all subsequent characters in the
format
string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) comprise the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all the characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right-bracket character is in the scanlist and the next right bracket character is the matching right bracket that ends the specification; otherwise, the first right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.
- c
- Matches a sequence of characters (no special provisions are made for multibyte characters) of the number specified by the field width (1 if no field width is present in the directive). The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence. No null character is added.
- p
- Matches an implementation-defined set of sequences, which should be the same as the set of sequences that may be produced by the %p conversion of the
fprintf
function. The corresponding argument shall be a pointer tovoid
. The interpretation of the input then is implementation-defined. If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined.
- n
- No input is consumed. The corresponding argument shall be a pointer to integer into which is to be written the number of characters read from the input stream so far by this call to the
fscanf
function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of thefscanf
function.
- %
- Matches a single %; no conversion or assignment occurs. The complete conversion specification shall be %%.
If a conversion specification is invalid, the behavior is undefined.
The conversion specifiers E, G, and X are also valid and behave the same as, respectively, e, g, and x.
If end-of-file is encountered during input, conversion is terminated. If end-of-file occurs before any characters matching the current directive have been read (other than leading white space, where permitted), execution of the current directive terminates with an input failure; otherwise, unless execution of the current directive is terminated with a matching failure, execution of the following directive (if any) is terminated with an input failure.
If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream. Trailing white space (including new-line characters) is left unread unless matched by a directive. The success of literal matches and suppressed assignments is not directly determinable other than via the %n directive.
The fscanf
function returns the value of the macro EOF
if an input failure occurs before any conversion. Otherwise, the fscanf
function returns the number of input items assigned, which can be fewer than provided for, or even zero, in the event of an early matching failure.
The scanf
function is equivalent to fscanf
with the argument stdin
interposed before the arguments to scanf
. Its return value is similar to that of fscanf
.
The sscanf
function is equivalent to fscanf
, except that the argument s
specifies a string from which the input is to be obtained, rather than from a stream. Reaching the end of the string is equivalent to encountering the end-of-file for the fscanf
function. If copying takes place between objects that overlap, the behavior is undefined.
Writing to Files
[edit | edit source]Character Output Functions
[edit | edit source]The fputc
function
[edit | edit source]#include <stdio.h> int fputc(int c, FILE *stream);
The fputc
function writes the character specified by c
(converted to an unsigned char
) to the stream pointed to by stream
at the position indicated by the associated file position indicator (if defined), and advances the indicator appropriately. If the file cannot support positioning requests, or if the stream is opened with append mode, the character is appended to the output stream. The function returns the character written, unless a write error occurs, in which case the error indicator for the stream is set and fputc
returns EOF
.
The fputs
function
[edit | edit source]#include <stdio.h> int fputs(const char *s, FILE *stream);
The fputs
function writes the string pointed to by s
to the stream pointed to by stream
. The terminating null character is not written. The function returns EOF
if a write error occurs, otherwise it returns a nonnegative value.
The putc
function
[edit | edit source]#include <stdio.h> int putc(int c, FILE *stream);
The putc
function is equivalent to fputc
, except that if it is implemented as a macro, it may evaluate stream
more than once, so the argument should never be an expression with side effects. The function returns the character written, unless a write error occurs, in which case the error indicator for the stream is set and the function returns EOF
.
The putchar
function
[edit | edit source]#include <stdio.h> int putchar(int c);
The putchar
function is equivalent to putc
with the second argument stdout
. It returns the character written, unless a write error occurs, in which case the error indicator for stdout
is set and the function returns EOF
.
The puts
function
[edit | edit source]#include <stdio.h> int puts(const char *s);
The puts
function writes the string pointed to by s
to the stream pointed to by stdout
, and appends a new-line character to the output. The terminating null character is not written. The function returns EOF
if a write error occurs; otherwise, it returns a nonnegative value.
Direct output function: the fwrite
function
[edit | edit source]#include <stdio.h> size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The fwrite
function writes, from the array pointed to by ptr
, up to nmemb
elements whose size is specified by size
to the stream pointed to by stream
. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. The function returns the number of elements successfully written, which will be less than nmemb
only if a write error is encountered.
Formatted output functions: the printf
family of functions
[edit | edit source]#include <stdarg.h> #include <stdio.h> int fprintf(FILE *stream, const char *format, ...); int printf(const char *format, ...); int sprintf(char *s, const char *format, ...); int vfprintf(FILE *stream, const char *format, va_list arg); int vprintf(const char *format, va_list arg); int vsprintf(char *s, const char *format, va_list arg);
Note: Some length specifiers and format specifiers are new in C99. These may not be available in older compilers and versions of the stdio library, which adhere to the C89/C90 standard. Wherever possible, the new ones will be marked with (C99).
The fprintf
function writes output to the stream pointed to by stream
under control of the string pointed to by format
that specifies how subsequent arguments are converted for output. If there are insufficient arguments for the format, the behavior is
undefined. If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored. The fprintf
function returns when the end of the format string is encountered.
The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format is composed of zero or more directives: ordinary multibyte characters (not %), which are copied unchanged to the output stream; and conversion specifications, each of which results in fetching zero or more subsequent arguments, converting them, if applicable, according to the corresponding conversion specifier, and then writing the result to the output stream.
Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
- Zero or more flags (in any order) that modify the meaning of the conversion specification.
- An optional minimum field width. If the converted value has fewer characters than the field width, it is padded with spaces (by default) on the left (or right, if the left adjustment flag, described later, has been given) to the field width. The field width takes the form of an asterisk * (described later) or a decimal integer. (Note that 0 is taken as a flag, not as the beginning of a field width.)
- An optional precision that gives the minimum number of digits to appear for the d, i, o, u, x, and X conversions, the number of digits to appear after the decimal-point character for a, A, e, E, f, and F conversions, the maximum number of significant digits for the g and G conversions, or the maximum number of characters to be written from a string in s conversions. The precision takes the form of a period (.) followed either by an asterisk * (described later) or by an optional decimal integer; if only the period is specified, the precision is taken as zero. If a precision appears with any other conversion specifier, the behavior is undefined. Floating-point numbers are rounded to fit the precision; i.e. printf("%1.1f\n", 1.19); produces 1.2.
- An optional length modifier that specifies the size of the argument.
- A conversion specifier character that specifies the type of conversion to be applied.
As noted above, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int
argument supplies the field width or precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted. A negative field width argument is taken as a - flag followed by a positive field width. A negative precision argument is taken as if the precision were omitted.
The flag characters and their meanings are:
- -
- The result of the conversion is left-justified within the field. (It is right-justified if this flag is not specified.)
- +
- The result of a signed conversion always begins with a plus or minus sign. (It begins with a sign only when a negative value is converted if this flag is not specified. The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.)
- space
- If the first character of a signed conversion is not a sign, or if a signed conversion results in no characters, a space is prefixed to the result. If the space and + flags both appear, the space flag is ignored.
- #
- The result is converted to an "alternative form". For o conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed). For x (or X) conversion, a nonzero result has 0x (or 0X) prefixed to it. For a, A, e, E, f, F, g, and G conversions, the result always contains a decimal-point character, even if no digits follow it. (Normally, a decimal-point character appears in the result of these conversions only if a digit follows it.) For g and G conversions, trailing zeros are not removed from the result. For other conversions, the behavior is undefined.
- 0
- For d, i, o, u, x, X, a, A, e, E, f, F, g, and G conversions, leading zeros (following any indication of sign or base) are used to pad to the field width; no space padding is performed. If the 0 and - flags both appear, the 0 flag is ignored. For d, i, o, u, x, and X conversions, if a precision is specified, the 0 flag is ignored. For other conversions, the behavior is undefined.
The length modifiers and their meanings are:
- hh
- (C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char
orunsigned char
argument (the argument will have been promoted according to the integer promotions, but its value shall be converted tosigned char
orunsigned char
before printing); or that a following n conversion specifier applies to a pointer to asigned char
argument.
- h
- Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int
orunsigned short int
argument (the argument will have been promoted according to the integer promotions, but its value shall be converted toshort int
orunsigned short int
before printing); or that a following n conversion specifier applies to a pointer to ashort int
argument.
- l (ell)
- Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
long int
orunsigned long int
argument; that a following n conversion specifier applies to a pointer to along int
argument; (C99) that a following c conversion specifier applies to awint_t
argument; (C99) that a following s conversion specifier applies to a pointer to awchar_t
argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.
- ll (ell-ell)
- (C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
long long int
orunsigned long long int
argument; or that a following n conversion specifier applies to a pointer to along long int
argument.
- j
- (C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to an
intmax_t
oruintmax_t
argument; or that a following n conversion specifier applies to a pointer to anintmax_t
argument.
- z
- (C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
size_t
or the corresponding signed integer type argument; or that a following n conversion specifier applies to a pointer to a signed integer type corresponding tosize_t
argument.
- t
- (C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
ptrdiff_t
or the corresponding unsigned integer type argument; or that a following n conversion specifier applies to a pointer to aptrdiff_t
argument.
- L
- Specifies that a following a, A, e, E, f, F, g, or G conversion specifier applies to a
long double
argument.
If a length modifier appears with any conversion specifier other than as specified above, the behavior is undefined.
The conversion specifiers and their meanings are:
- d, i
- The
int
argument is converted to signed decimal in the style [−]dddd. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it is expanded with leading zeros. The default precision is 1. The result of converting a zero value with a precision of zero is no characters.
- o, u, x, X
- The
unsigned int
argument is converted to unsigned octal (o), unsigned decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; the letters abcdef are used for x conversion and the letters ABCDEF for X conversion. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it is expanded with leading zeros. The default precision is 1. The result of converting a zero value with a precision of zero is no characters.
- f, F
- A
double
argument representing a (finite) floating-point number is converted to decimal notation in the style [−]ddd.ddd, where the number of digits after the decimal-point character is equal to the precision specification. If the precision is missing, it is taken as 6; if the precision is zero and the # flag is not specified, no decimal-point character appears. If a decimal-point character appears, at least one digit appears before it. The value is rounded to the appropriate number of digits.
(C99) Adouble
argument representing an infinity is converted in one of the styles [-]inf or [-]infinity — which style is implementation-defined. A double argument representing a NaN is converted in one of the styles [-]nan or [-]nan(n-char-sequence) — which style, and the meaning of any n-char-sequence, is implementation-defined. The F conversion specifier produces INF, INFINITY, or NAN instead of inf, infinity, or nan, respectively. (When applied to infinite and NaN values, the -, +, and space flags have their usual meaning; the # and 0 flags have no effect.)
- e, E
- A
double
argument representing a (finite) floating-point number is converted in the style [−]d.ddde±dd, where there is one digit (which is nonzero if the argument is nonzero) before the decimal-point character and the number of digits after it is equal to the precision; if the precision is missing, it is taken as 6; if the precision is zero and the # flag is not specified, no decimal-point character appears. The value is rounded to the appropriate number of digits. The E conversion specifier produces a number with E instead of e introducing the exponent. The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent. If the value is zero, the exponent is zero.
(C99) Adouble
argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
- g, G
- A
double
argument representing a (finite) floating-point number is converted in style f or e (or in style F or E in the case of a G conversion specifier), with the precision specifying the number of significant digits. If the precision is zero, it is taken as 1. The style used depends on the value converted; style e (or E) is used only if the exponent resulting from such a conversion is less than –4 or greater than or equal to the precision. Trailing zeros are removed from the fractional portion of the result unless the # flag is specified; a decimal-point character appears only if it is followed by a digit.
(C99) Adouble
argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
- a, A
- (C99) A double argument representing a (finite) floating-point number is converted in the style [−]0xh.hhhhp±d, where there is one hexadecimal digit (which is nonzero if the argument is a normalized floating-point number and is otherwise unspecified) before the decimal-point character (Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble [4-bit] boundaries.) and the number of hexadecimal digits after it is equal to the precision; if the precision is missing and
FLT_RADIX
is a power of 2, then the precision is sufficient for an exact representation of the value; if the precision is missing andFLT_RADIX
is not a power of 2, then the precision is sufficient to distinguish (The precision p is sufficient to distinguish values of the source type if 16p–1 > bn where b isFLT_RADIX
and n is the number of base-b digits in the significand of the source type. A smaller p might suffice depending on the implementation's scheme for determining the digit to the left of the decimal-point character.) values of typedouble
, except that trailing zeros may be omitted; if the precision is zero and the # flag is not specified, no decimal-point character appears. The letters abcdef are used for a conversion and the letters ABCDEF for A conversion. The A conversion specifier produces a number with X and P instead of x and p. The exponent always contains at least one digit, and only as many more digits as necessary to represent the decimal exponent of 2. If the value is zero, the exponent is zero.
Adouble
argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
- c
- If no l length modifier is present, the
int
argument is converted to anunsigned char
, and the resulting character is written.
(C99) If an l length modifier is present, thewint_t
argument is converted as if by an ls conversion specification with no precision and an argument that points to the initial element of a two-element array ofwchar_t
, the first element containing thewint_t
argument to the lc conversion specification and the second a null wide character.
- s
- If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type. (No special provisions are made for multibyte characters.) Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.
(C99) If an l length modifier is present, the argument shall be a pointer to the initial element of an array ofwchar_t
type. Wide characters from the array are converted to multibyte characters (each as if by a call to thewcrtomb
function, with the conversion state described by anmbstate_t
object initialized to zero before the first wide character is converted) up to and including a terminating null wide character. The resulting multibyte characters are written up to (but not including) the terminating null character (byte). If no precision is specified, the array shall contain a null wide character. If a precision is specified, no more than that many characters (bytes) are written (including shift sequences, if any), and the array shall contain a null wide character if, to equal the multibyte character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case is a partial multibyte character written. (Redundant shift sequences may result if multibyte characters have a state-dependent encoding.)
- p
- The argument shall be a pointer to
void
. The value of the pointer is converted to a sequence of printable characters, in an implementation-defined manner.
- n
- The argument shall be a pointer to signed integer into which is written the number of characters written to the output stream so far by this call to
fprintf
. No argument is converted, but one is consumed. If the conversion specification includes any flags, a field width, or a precision, the behavior is undefined.
- %
- A % character is written. No argument is converted. The complete conversion specification shall be %%.
If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding coversion specification, the behavior is undefined.
In no case does a nonexistent or small field width cause truncation of a field; if the result of a conversion is wider than the field width, the field is expanded to contain the conversion result.
For a and A conversions, if FLT_RADIX
is a power of 2, the value is correctly rounded to a hexadecimal floating number with the given precision.
It is recommended practice that if FLT_RADIX
is not a power of 2, the result should be one of the two adjacent numbers in hexadecimal floating style with the given precision, with the extra stipulation that the error should have a correct sign for the current rounding direction.
It is recommended practice that for e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most DECIMAL_DIG
, then the result should be correctly rounded. (For binary-to-decimal conversion, the result format's values are the numbers representable with the given format specifier. The number of significant digits is determined by the format specifier, and in the case of fixed-point conversion by the source value as well.) If the number of significant decimal digits is more than DECIMAL_DIG
but the source value is exactly representable with DECIMAL_DIG
digits, then the result should be an exact representation with trailing zeros. Otherwise, the source value is bounded by two adjacent decimal strings L < U, both having DECIMAL_DIG significant digits; the value of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that the error should have a correct sign for the current rounding direction.
The fprintf
function returns the number of characters transmitted, or a negative value if an output or encoding error occurred.
The printf
function is equivalent to fprintf
with the argument stdout
interposed before the arguments to printf
. It returns the number of characters transmitted, or a negative value if an output error occurred.
The sprintf
function is equivalent to fprintf
, except that the argument s
specifies an array into which the generated input is to be written, rather than to a stream. A null character is written at the end of the characters written; it is not counted as part of the returned sum. If copying takes place between objects that overlap, the behavior is undefined. The function returns the number of characters written in the array, not counting the terminating null character.
The vfprintf
function is equivalent to fprintf
, with the variable argument list replaced by arg
, which shall have been initialized by the va_start
macro (and possibly subsequent va_arg
calls). The vfprintf
function does not invoke the va_end
macro. The function returns the number of characters transmitted, or a negative value if an output error occurred.
The vprintf
function is equivalent to printf
, with the variable argument list replaced by arg
, which shall have been initialized by the va_start
macro (and possibly subsequent va_arg
calls). The vprintf
function does not invoke the va_end
macro. The function returns the number of characters transmitted, or a negative value if an output error occurred.
The vsprintf
function is equivalent to sprintf
, with the variable argument list replaced by arg
, which shall have been initialized by the va_start
macro (and possibly subsequent va_arg
calls). The vsprintf
function does not invoke the va_end
macro. If copying takes place between objects that overlap, the behavior is undefined. The function returns the number of characters written into the array, not counting the terminating null character.
References
[edit | edit source]- ↑ C99 §6.2.5/15