WoodyLogger

  • Low Latency C++11 Asynchronous Logging Library.
  • Lock-Free Optimized for Mulit-Core system, so it is very fast. See Latency Benchmark
  • Supports multiple log levels: FAULT, ERROR, WARN, INFO, DEBUG, TRACE
  • Configurable for output to local file/stdout or both.
  • Colorful output on stdout for different log level on UNIX terminal.

Usage

#include "WoodyLogger.h"
#include <thread>
#include <unistd.h>

void PrintLogThread() {
    LOG_ERROR("PrintLogThread Join -->");
    LOG_ERROR("PrintLogThread END <---");
    usleep(1000);
}

int main() {
    WOODY_LOGGER_START();
    //WOODY_LOGGER_INIT("/dev/null", false);
    WOODY_LOGGER_LEVEL(woodycxx::DEBUG);

    LOG_TRACE("Test LOG_TRACE: %d", 12345);  // TRACE will output on DEBUG level
    LOG_DEBUG("Test LOG_DEBUG: %d", 23456);
    LOG_ERROR("Test LOG_ERROR: %d", 6543210);
    LOG_WARN ("Test LOG_WARN : %07d", 123);
    LOG_INFO ("Test LOG_INFO : %.5f", 1.0/3.0);
    LOG_FAULT("Test LOG_FAULT: %p", &amp;PrintLogThread);

    std::thread print_thread1 = std::thread(PrintLogThread);
    print_thread1.join();

    return 0;
}

API Description

Macro Define: WOODY_LOGGER

In “WoodyLogger.h”, there is a macro #define WOODY_LOGGER. User can use it to control the Log API calling comiling. If WOODY_LOGGER is not defined, any call of WOODY_LOGGER_XXXX will not be compilied. So user can put WoodyLogger in code for test and debug usage and then undefine the WOODY_LOGGER macro to turn off WoodyLogger.

WOODY_LOGGER_START

Start WoodyLogger, which starts the Consumer thread. It is just needed to be called once only. The consumer thread will keep running until WOODY_LOGGER_STOP() is called.

WOODY_LOGGER_START() should be called in main() function, and before any LOG_XXXX call.

WOODY_LOGGER_INIT

WOODY_LOGGER_INIT marco is used to configure the log output, it has 2 parameters:

("file_path_name", is_enalbe_output_to_stdout)

e.g.

WOODY_LOGGER_INIT("./log.txt");

Will configure the log output into “./log.txt” file and also the stdout. If user want to turn-off the stdout:

WOODY_LOGGER_INIT("./log.txt", false); //disable stdout

User can also configure the FILE handler (e.g. stdout, stderr):

WOODY_LOGGER_INIT(stderr);

User can direct the file name to "/dev/null" and disable the stdout, then the log will NOT output any:

WOODY_LOGGER_INIT("/dev/null", false);

This is used in benchmark test. If WOODY_LOGGER_INIT is not configured, it will output to stdout in default as the demo shows.

WOODY_LOGGER_LEVEL

There are 6 log levels as Strong to Week order:

FAULT, ERROR, WARN, INFO, DEBUG, TRACE

Logger will only output the level which is stronger than the current configured level.

e.g.

If the level is setup to INFO, then DEBUG and TRACE will not be output.

WOODY_LOGGER_LEVEL() can be called in a thread which means the log level can be controlled dynamically. User can write a thread listen to a port as a Server. The client process can connect to the server and configure the level dynamically.

LOG_XXXX

LOG_TRACE, LOG_DEBUG, LOG_INFO, LOG_WARN, LOG_ERROR, LOG_FAULT

The output format

DEBUG, TRACE Level:

[YYYY-MM-DD HH:mm:ss.nanoseconds] [LEVEL] [thread_id] [__FILE__: __LINE__] message

FAULT, ERROR, WARN, INFO Level:

[YYYY-MM-DD HH:mm:ss.nanoseconds] [LEVEL] [thread_id] message

WOODY_LOGGER_STOP

WoodyLogger will start a consumer thread print out the log message. The thread will keep running if user does not call WOODY_LOGGER_STOP()


Benchmark

When I search “Low-Latency C++ Log” in github, I find a very fast C++ log project: NanoLog. It has been proved much faster than spdlog and others by itself. So I am trying to use the benchmark test code in NanoLog so that If WoodyLogger is proved fater than NanoLog, WoodyLogger will be fater than the others. NanoLog’s design is so much like WoodyLogger, but WoodyLogger is optimized for multi-core system, it can take advantage in multi-thread environment.

Benchmark Code Compiling Option And System environment

The benchmark test code is run under my MacBook Pro Laptop:

Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
Total Number of Cores:  4
Threads:                8

Test Code

benchmark.cpp

WoodyLogger and NanoLog are both Compiled under -O2 optimization level.

Benchmark Test Result

ThreadsWoodyLogger(Avg, ns)NanLog(Avg, ns)
1106125
2122.5197
4144250.75
8274.375424.375
16526.9375936.4375

Design

WoodyLogger’s aim is to impletement a Log tool that can be applied in HFT (High Frequent Trading) System. So the design of WoodyLogger is focusing on Low Latency NOT High Throughput, which means WoodyLogger is not going to output as many logs into file as possible, but is going to reduces the latency to nanoseconds as short as possible.

Normal Design

What makes the high latency of a Log tool?

  • As a normal design of a “fast” log tool, I/O denseness firtly comes into developers’ mind.

    Usually, developers isolate the log writting operation and log calling operation into two threads. There is a ring buffer storing the messages which the Log API is called. Each caller is a Producer which writes out the log messages into the ring buffer. There is a Consumer thread reads the ring buffer and then print out the msg into file or stdout, etc.
    Each call of the log API will not be blocked by the I/O dely, so this is a great step to “speed up” (reduce the latency). But it is just the first step, there are more things to do.
  • Many developers did not realize that the String Formatter functions are the very important key of high latency. Such as std::stringstream in C++, sprintf in C, etc. Access FMT for reference.

    The key latency of string foratter is coverting digital numbers into string. Which is very slow in those function, espicially floating-point formatting.
    Although there are very quick Integer Converting and floating-point converting implementations: double-conversion, but they are not as quick as memory copy. So do NOT convert and format any digital numbers into string in Log calling, but do it in the reading/output thread.

Normally the Log tool is designed as following:

The key problem of latency is using the LOCK. If there are more caller threads, they will block each other even the LOCKER is a spin-lock.

WoodyLogger Design

Lock-Free Queue

The Locker can be removed by using a Lock-Free Queue. When thers are only one producer and one consumer, there is not going to need a lock anymore.

To achieve multi-thread support of a Logger and Lock-Free ring buffer, every thread need a thread_local Lock-Free ring buffer. The consumer thread orginize a std::vector to access every ring buffer (LockFreeQueue) in the threads. The life-cycle of LockFreeQueue is maintained by the thead, when the thread exit, the LockFreeQueue is also destoryed and be removed from the vector.

The WoodyLogger design is going to like the following diagram shows:

The first time I saw this kind of expression, I feel so strange with it:

file_buffer<uint8_t>::open(outputFileName, std::ios::out).then([=](streambuf<uint8_t> outFile) -> pplx::task<http_response>
{
    *fileBuffer = outFile; 

    // Create an HTTP request.
    // Encode the URI query since it could contain special characters like spaces.
    http_client client(U("http://www.bing.com/"));
    return client.request(methods::GET, uri_builder(U("/search")).append_query(U("q"), searchTerm).to_string());
})

 

So what does [=] (typename pram) -> typename { } exactly mean?

It is Lambda expression in C++11. A lambda expression represents a callable unit of code. It can be thought of as an unnamed, inline function. Like any function, a lambda has a return type, a parameter list, and a function body. Unlike a function, lambdas may be defined inside a function. A lamba expression has the form:

[capture list] (parameter list) -> return type { function body }

There is a detailed description of Lambda Expression Syntax on MSDN and CPPReference , I will not explain the syntax of Lambda Expression, I would like to introduce my understanding and usage of Lambda Expression.

As my understanding, Lambda Expression creates an Object of an Unnamed Functor (NOT a Function).

A functor is pretty much just a class which defines the operator(). That lets you create objects which “look like” a function (Stackoverflow):

// this is a functor
struct add_x {
  add_x(int x) : x(x) {}
  int operator()(int y) { return x + y; }

private:
  int x;
};

// Now you can use it like this:
add_x add42(42); // create an instance of the functor class
int i = add42(8); // and "call" it
assert(i == 50); // and it added 42 to its argument

std::vector<int> in; // assume this contains a bunch of values)
std::vector<int> out;
// Pass a functor to std::transform, which calls the functor on every element
// in the input sequence, and stores the result to the output sequence
std::transform(in.begin(), in.end(), out.begin(), add_x(1));
assert(out[i] == in[i] + 1); // for all i

 
As we have functor so, why do you need Lambda?

I think one important feature of Lambda is, it can create Anonymous Object, it can be Run On Defined.

Java programmers must be familiar with the code when they create an anonymous class, such expression was not supported in C++.  But Lambda express in C++11 can make a similar achievement. Java Programmers can define an anonymous Thread class:

public class A {
    public static void main(String[] arg)
    {
        new Thread()
        {
            public void run() {
                System.out.println("blah");
            }
        }.start();
    }
}

 

C++ now can directly pass a Lambda express into a function call, cause it just bass an object into that function. The grammar is different from anonymous class in Java:

void fillVector(vector<int>& v)
{
    // A local static variable.
    static int nextValue = 1;

    // The lambda expression that appears in the following call to
    // the generate function modifies and uses the local static
    // variable nextValue.
    generate(v.begin(), v.end(), [] { return nextValue++; });
    //WARNING: this is not thread-safe and is shown for illustration only
}

 
Programmer can directly pass a functor object with the function body expressions into a parameter, the code will run on define.

I am trying to use the C++11 to support the smart pointer, but I find there is no shard_array in <memory>, so I try to use it in this way, and I know this maybe WRONG:

shared_ptr<int> sp(new int[10]);

Then run it, it coredumped as I guessed:

$ smart_ptr/Test_shared_array
Destructing a Foo with x=0
*** Error in `smart_ptr/Test_shared_array': munmap_chunk(): invalid pointer: 0x0000000001d58018 ***
[1]    14128 abort (core dumped)  smart_ptr/Test_shared_array

Use GDB to see more information:

(gdb) run
Starting program: /home/nasacj/projects/woodycxx/smart_ptr/Test_shared_array
Destructing a Foo with x=0
*** Error in `/home/nasacj/projects/woodycxx/smart_ptr/Test_shared_array': munmap_chunk(): invalid pointer: 0x0000000000603018 ***

Program received signal SIGABRT, Aborted.
0x00007ffff7530cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7530cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff75340d8 in __GI_abort () at abort.c:89
#2  0x00007ffff756df24 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff767c6c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff7578c87 in malloc_printerr (action=<optimized out>, str=0x7ffff767ca48 "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:4996
#4  0x0000000000400d9f in _M_release (this=0x603050) at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#5  ~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#6  ~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#7  ~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr.h:93
#8  test () at Test_shared_array.cpp:30
#9  0x0000000000400bc9 in main () at Test_shared_array.cpp:36
(gdb) quit

Then I realize that in Boost, user should provide a deleter to shared_ptr:

Then I find this in stackoverflow:

By default, shared_ptr will call delete on the managed object when no more references remain to it. However, when you allocate using new[] you need to call delete[], and not delete, to free the resource.

In order to correctly use shared_ptr with an array, you must supply a custom deleter.

template< typename T >
struct array_deleter
{
  void operator ()( T const * p)
  {
    delete[] p;
  }
};

Create the shared_ptr as follows

std::shared_ptr<int> sp( new int[10], array_deleter<int>() );

Now shared_ptr will correctly call delete[] when destroying the managed object.


With C++11, you can also use a lambda instead of the functor.

std::shared_ptr<int> sp( new int[10], []( int *p ) { delete[] p; } );

Also, unless you actually need to share the managed object, a unique_ptr is better suited for this task, since it has a partial specialization for array types.

std::unique_ptr<int[]> up( new int[10] ); // this will correctly call delete[]

Now there come the shared array STD version in practice:

//#include "shared_array.h"
#include <memory>
#include <iostream>

using namespace std;

struct Foo
{
    Foo() : x(0) {}
	Foo( int _x ) : x(_x) {}
	~Foo() { std::cout << "Destructing a Foo with x=" << x << "\n"; }
	int x;
	/* ... */
};

template< typename T >
struct array_deleter
{
  void operator ()( T const * p)
  {
    delete[] p;
  }
};

//typedef woodycxx::smart_prt::shared_array<Foo> FooArray;
typedef shared_ptr<Foo> FooArray;

void test()
{
	FooArray(new Foo[10], array_deleter<Foo>());
}

int main()
{
	test();
	return 0;
}

The Output:

$ ./Test_shared_array
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0