printf() vs stream IO in C++
Before joining Dell I was mostly working in kernel writing in C programming language. At Dell I still work on mostly low level stuff, but this time it is user-mode, so I am not tied up to C anymore. We’re writing in C++ and I am learning C++. One of the less appealing things for me in C++ was streaming support and the way its input/output implemented. In particular I got used to printf() functions family and leaving those in favor of streams and cout was tough. What really strikes me is the fact that no C++ book explains this stuff. All C++ books just tell you – so, my dear, this is how this stuff is done in C++. It took me some time to realize how C++ style input/output is much more convenient and powerful than printf() family of functions. Here’s why.
The C++ way
So, In C++ you normally work with stream of some sort. Stream is an abstraction that may hide anything in the world. It can be string stream that writes things to string or debug output stream that sends debug printouts to some remote debug log server. Once you have a stream, you redefine operator << for various classes to write into stream instead of doing bit-shift. For input streams you define operator >> that reads data from string or from socket and initializes object of some class with values it read. All those operator << and >> seems kind of weird. And in fact, this can be any of the operators. For some reason, standard library uses bit-shift operators and this turned into a standard. Boost, popular C++ extensions library uses operator% to implement similar functionality, although in slightly different way.
Convenience
At first, defining operator << for every class I write seemed like a tedious task. But you just do it once and then you can use it forever. Moreover, you can use classes that already have it to implement more stuff. So you just do it once really.
It makes much more sense in object oriented language
Since we’re talking object oriented here, C++ way to output stuff actually makes much more sense. The way object represented as a string is a property is the object itself.
Performance
Here is the interesting part. In printf(), the standard C library has to parse format string every time you print something. So, at runtime, every time you print something, standard C library parses your format string to figure out what and how many parameters you passed to printf(). This is important for high performance systems. Intensively printing, just anything, whether this is a debug log or part of some communication protocol rises CPU consumption of your program to the roof.
I worked for one company where they came up with a really neat trick to overcome the problem of high CPU consumption. They had to implement debug logging in C for their high performance product. So, instead of formatting the string every time there is a printout, they saved format string and then every printout they just saved the arguments in their binary form. To read the debug log, they had to read format strings that were periodically saved in the debug log file. Then they read binary arguments for the debug printout and finally formatted the message. So instead of formatting the printouts at runtime, they formatted the printouts when debug log is read.
Another problem occurs when you by mistake pass wrong data type to printf() function and compiler misses it. Another problem is when you pass a string that is not properly null terminated. Etc. Etc. C++ way solves all these problems. First, data type resolving happens during compilation. It is up to compiler to detect what type of data you’re trying to print and to call right operator<<. This improves CPU consumption situation. For the same reason, it is much harder to print something that is of unexpected type. Compiler is usually pretty good at resolving data types.
Wrapping it up
So, it seems that C++ way of producing output both better conforms object oriented way of programming and yields higher quality results. The only problem is that it requires you to get used to, but then, you do it just once.
1. Really high performance stuff does not print anything at all – there are other ways of logging.
2. With printfs it is really easy to devise log_printf() functions that would not do anything unless appropriate log level is in effect. It is much harder to devise streams with similar properties and in any case the overhead may be significant because per class operator<<() will be called anyway.
So after 10+ years of C++ my personal choices for logging are: (a) define fixed set of debug events, collect logging data and parse format strings processed only when 'show log' command is executed, (b) use printf-like functions.
BTW, knowing how many times particular log was hit is very important information and it could be collected even when logs are disabled.
@Another Alex
I’ll appreciate if you share other ways of logging.
I disagree with your second point. It is similarly easy to make debug logging code that would do nothing unless appropriate log level is in effect.
Other ways of logging: I am sorry I meant debug events mentioned at the end of my post.
Believe me I did it it is a non-trivial exercise to write debug streams and by end of the day there still be noticeable overhead from uncontrolled calls to operator<<() – so IMHO it is not worth it to do debug streams.
I think that operators <> were chosen for IO because of being some kind of a visual cue for both in and out operations AND because of operator evaluation precedence.
If you want to learn more about C++ IOStreams you can check out this book:
http://www.amazon.com/Standard-IOStreams-Locales-Programmers-Reference/dp/product-description/0201183951
A really high performance logging mechanism would log every thing in binary and requires a custom tool to “visualize” the log.
If you want the log to be human-readable and still want the performance, you could do a memory-map of the log file. Every log would just be a memory operation. Periodically this memory-map gets flushed to disk.
@moe – Thanks. I’ll take a look.
@Lucas Tan – It depends on how much information you want to log.
I guess my opinions on logs should be viewed considering what I am working on. I am working on a large embedded system. So, we have a server with 8TB of storage just for logs.
Cout is slower than printf.
// g++ -O2 -W -Wall -Wextra printf-vs-cout.cpp -o printf-vs-cout.exe && ./printf-vs-cout.exe 1>/dev/null
#include
#include
#include // Write
#ifdef WIN32
# include
#else
# include
# include
#endif
using namespace std;
double get_time();
/*
Printf no args
4.88426
Printf 1 arg
18.5892
Puts no args
4.51563
Fputs no args
2.98327
Write no args
20.6939
Cout no args
5.32271
Cout 1 arg
27.0982
*/
enum { LOOP_SIZE = 100000000 };
#define makeLoop( name, … ) \
cerr << name"\n"; \
begin = get_time(); \
for( int i=0; i<LOOP_SIZE; ++i ) \
{ \
__VA_ARGS__; \
} \
cerr << '\t' << get_time()-begin << '\n';
int main()
{
double begin = 0;
makeLoop("Printf no args", printf("test\n"));
makeLoop("Printf 1 arg", printf("test %d\n", i));
cerr << '\n';
makeLoop("Puts no args", puts("test\n"));
makeLoop("Fputs no args", fputs("test\n", stdout));
cerr << '\n';
makeLoop("Write no args", write(1, "test\n", 5));
cerr << '\n';
makeLoop("Cout no args", cout << "test\n");
makeLoop("Cout 1 arg", cout << "test" << i << '\n');
return 0;
}
double get_time()
{
# ifdef WIN32
LARGE_INTEGER t, f;
QueryPerformanceCounter(&t);
QueryPerformanceFrequency(&f);
return (double)t.QuadPart/(double)f.QuadPart;
# else
struct timeval t;
struct timezone tzp;
gettimeofday(&t, &tzp);
return t.tv_sec + t.tv_usec*1e-6;
# endif
}
Try
std::ios::sync_with_stdio(false);
C++ tried very hard to be compatible with C at the cost of performance.
More: http://ask.fclose.com/262/is-cin-much-slower-than-scanf-in-c
Pastbin link for the code: http://pastebin.com/vR2KxvjV
C’s printf functions still has one feature which the std::cout and friends don’t offer – synchronized output. See here: http://stackoverflow.com/q/26961023/183120
The essence is that
printf(“something %d something\n”, var1);
gets printed on to the screen in one shot. While
std::cout << "something " << var1 << "something\n";
may not appear on screen as a single output if between "something " and var1, something else writes to stdout.
In this sense I still think that C++ approach is superior:
1. You do not copy data to internal buffer and then to standard output, but rather write multiple pieces to standard output directly.
2. You can still achieve same effect with C++. Create intermediate stringstream, fill it and then write to standard output.
Nevertheless, this is a good point. Thanks.
I used to make the same argument about the time to parse the format string. Then I stepped into the “<<" function for I/O streams on a windows system. A the time, the iostreams implementation generated a text format string, then called sprintf to do the actual formatting. I haven't checked the latest.
Things are a little better now.