My previous post “Use shared_ptr inheritance rightly when design and use interfaces” has an obvious problem: There is a cyclic reference between AbstractSocketImpl and Socket I/O Streams.

typedef shared_ptr<InputStream> InputStreamPtr;
typedef shared_ptr<OutputStream> OutputStreamPtr;
typedef shared_ptr<AbstractSocketImpl> AbstractSocketImplPtr;

class AbstractSocketImpl : public AbstractSocket
    InputStreamPtr    inputStreamPtr;
    OutputStreamPtr   outputStreamPtr;

class SocketInputStream : public InputStream
    AbstractSocketImplPtr socket_impl;

When two class have shared_ptr hold each other, it will cause meory leak because of cyclic reference. When shared_ptr objects of AbstractSocketImpl and SocketInputStream are created the reference count in SocketInputStreamPtr and AbstractSocketImplPtr will count each other, so the refer_cout will be 2 of each.

auto socketImpl = make_shared<AbstractSocketImpl>(address);   //socketImpl.ref_count + 1, now socketImpl.ref_count == 1
auto inputstream = socketImpl->getInputStream();   //socketImpl.ref_count + 1 (caused by inputstream.socket_impl), now socketImpl.ref_count == 2.
//inputstream.ref_count == 2, because inputstream holds one reference and inputstream.socket_impl.inputStreamPtr holds inputstream.

The relationship between AbstractSocketImpl and SocketInputStream should be Composition, which SocketInputStream is initialized from an AbstractSocketImpl. There is no meaning for any operations of SocketInputStream when AbstractSocketImpl is destoried. I was incorrectly making strong reference from AbstractSocketImpl to SocketInputStream, which I was trying to manage the memory life-cycle of SocketInputStream from AbstractSocketImpl and make them high coupling. The SocketInputStream is created by AbstractSocketImpl and my mistake was trying to use a shared_ptr (strong reference) to trace the SocketInputStream object. But in fact, SocketInputStream object is managed by user not AbstractSocketImpl, AbstractSocketImpl‘s responsibility is to make sure creating only ONE SocketInputStream object, but NOT recycling it. So AbstractSocketImpl do not need to HOLD a SocketInputStream, it only need to peek whether the SocketInputStream object it created is still alive, if yes, return the pointer, if not, create the new one from itself. So I use weak_ptr to have weak reference to SocketInputStream. Following is the implementation changes:

typedef weak_ptr<InputStream> InputStreamWeakPtr;
typedef weak_ptr<OutputStream> OutputStreamWeakPtr;

class AbstractSocketImpl : public AbstractSocket
    InputStreamWeakPtr   wkInputStreamPtr;
    OutputStreamWeakPtr  wkOutputStreamPtr;

InputStreamPtr AbstractSocketImpl::getInputStream()
    //return make_shared<SocketInputStream>(shared_from_this());
    if ( wkInputStreamPtr.expired() )
        InputStreamPtr inputstrPtr = make_shared<SocketInputStream>(shared_from_this());
        wkInputStreamPtr = inputstrPtr;
        return inputstrPtr;
    return wkInputStreamPtr.lock();

I was asked a question about calculate the Fibonacci number at the compile time not at run time in C++.

Initially, I have no idea about how to solve this problem, how to make the calculation happen in the compile time? Then the key solution is: Using Template.

Template metaprogramming can make compile-time class generation, and also can perform polymorphism in a static, which is well-known as the Curiously Recurring Template Pattern (CRTP).

So the solution is listed as following:


template<int N>
class Fibonacci {
    enum { value = Fibonacci<N-1>::value + Fibonacci<N-2>::value };

class Fibonacci<1> {
    enum { value = 1 };

class Fibonacci<0> {
    enum { value = 0 };

int main() {
    int i =  Fibonacci<6>::value;
    return i;

Compile it to assembly “g++ -O2 -S Fibonacci_template.cpp


_main:                                  ## @main
## BB#0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	movl	$8, %eax
	popq	%rbp


You can find: The assembly output (Line 12) is compiled into the exact number of Fibonacci(6) is to be 8.

If we want to use polymorphism in C++, normally we need to use a base interface pointer to point to an implemented class object. It is quite easy to use in normal pointer in C++. But if we use smart pointer, shared_ptr with the base class type to hold an implemented object, things going to be a little complicated.

Firstly I think it may easy going like this way:

//Point Base pointer to an implemented Derrived object
//Class "Derived" inheritances from class "Base"
typedef std::shared_ptr<Base> BasePtr;
BasePtr baseptr(new Derived());
//Then call baseptr->operations


The principle of using smart pointer to prevent memory leak is always use a named smart pointer variable to hold the result of new

But one thing need to be noted very carefully is that, you cannot use more than ONE shared_ptr to hold the same result of new:

int* ptr = new int;
shared_ptr<int> p1(ptr);
shared_ptr<int> p2(ptr); //logic error

Because each time we construct a shared_ptr object, the code will maintain 2 pointers in the object:
1. Type <T*> pointer to the object you new in the heap;
2. A “Smart Area” sp_count which holds the reference count of all the shared_ptr objects which hold the <T*>;

Smart Pointer
Each time you use copy constructor or use operate=, shared_ptr will make the reference count maintenance to plus 1 or minus 1; So if there are 2 shared_ptr objects hold the same T*, the pointer <T*> will be delete twice.

This is same when we deal with this pointer. But there is a solution: use shared_from_this():

#include <memory>
#include <iostream>

struct Good: std::enable_shared_from_this<Good>
    std::shared_ptr<Good> getptr() {
        return shared_from_this();

struct Bad
    std::shared_ptr<Bad> getptr() {
        return std::shared_ptr<Bad>(this);
    ~Bad() { std::cout << "Bad::~Bad() called\n"; }

int main()
    // Good: the two shared_ptr's share the same object
    std::shared_ptr<Good> gp1(new Good);
    std::shared_ptr<Good> gp2 = gp1->getptr();
    std::cout << "gp2.use_count() = " << gp2.use_count() << '\n';

    // Bad, each shared_ptr thinks it's the only owner of the object
    std::shared_ptr<Bad> bp1(new Bad);
    std::shared_ptr<Bad> bp2 = bp1->getptr();
    std::cout << "bp2.use_count() = " << bp2.use_count() << '\n';
} // UB: double-delete of Bad


 Interface Design

When I was trying to design an AbstracktSocket, which can return Socket I/O Streams, users can use I/O Streams smart_ptr to receive/send message through socket, just like the “Java way”:


AbstractSocketImpl implements the interface AbstractSocket, it has the getInputStream() and getOutputStream(), which will return the SocketInputStream and SocketOutputSteam. But AbstractSocketImpl holds shared_ptr of InputStream and OutputStream which implemented from AbstractSocket. SocketInputStream and SocketOutputSteam are constructed by passing AbstractSocketImpl smart_ptr into their Constructors. So when AbstractSocketImpl initialize the Socket I/O Streams, it will share this pointer. To use shared_ptr rightly, we need make AbstractSocketImpl inherit from std::enable_shared_from_this:


InputStreamPtr AbstractSocketImpl::getInputStream()
    if ( !inputStreamPtr )
        inputStreamPtr = make_shared<SocketInputStream>(shared_from_this());
    return inputStreamPtr;

OutputStreamPtr AbstractSocketImpl::getOutputStream()
    if ( !outputStreamPtr )
        outputStreamPtr = make_shared<SocketOutputStream>(shared_from_this());
    return outputStreamPtr;


You may notice that the inputStreamPtr is a shared_ptr<InputStream> type, but make_shared creates a shared_ptr<SocketInputStream> object. They are not consistent, but the compiler does not returns any error on GNU Compiler and Microsoft Windows Compiler on C++11. There is a conservative way to convert the smart pointer, by using static_pointer_cast<T> or dynamic_pointer_cast<T>:

inputStreamPtr = static_pointer_cast<InputStream>( make_shared<SocketInputStream>(shared_from_this()) );



I have some concern about making AbstractSocketImpl inherit from std::enable_shared_from_this, why not make AbstractSocket inherit from std::enable_shared_from_this cause AbstractSocketImpl already inherits from AbstractSocket. So how to deal with the shared_from_this()? Cause the template types are different between AbstractSocket and AbstractSocketImpl.
The Solution is following:

class AbstractSocket : boost::noncopyable, public enable_shared_from_this<AbstractSocket> { ... }

class AbstractSocketImpl : public AbstractSocket
    std::shared_ptr<AbstractSocketImpl> shared_from_this()
        return std::static_pointer_cast<AbstractSocketImpl>(AbstractSocket::shared_from_this());


Once using enable_shared_from_this, The object must be created in Heap, NOT in Stack. Because the weak_ptr in enable_shared_from_this should be initialized. Any pointer created in Stack wrapped in shared_ptr will cause Wrong Memory Access:

// AbstractSocketImpl socketImpl(address);  //----> This is NOT right!
AbstractSocketImplPtr socketImpl = make_shared<AbstractSocketImpl>(address);
InputStreamPtr inputstream = socketImpl->getInputStream();
OutputStreamPtr outputstream = socketImpl->getOutputStream();



There is a topic on Stackoverflow, wich describes the correct usage of multiple inheritance from enabled_share_from_this.

The first time I saw this kind of expression, I feel so strange with it:

file_buffer<uint8_t>::open(outputFileName, std::ios::out).then([=](streambuf<uint8_t> outFile) -> pplx::task<http_response>
    *fileBuffer = outFile; 

    // Create an HTTP request.
    // Encode the URI query since it could contain special characters like spaces.
    http_client client(U(""));
    return client.request(methods::GET, uri_builder(U("/search")).append_query(U("q"), searchTerm).to_string());


So what does [=] (typename pram) -> typename { } exactly mean?

It is Lambda expression in C++11. A lambda expression represents a callable unit of code. It can be thought of as an unnamed, inline function. Like any function, a lambda has a return type, a parameter list, and a function body. Unlike a function, lambdas may be defined inside a function. A lamba expression has the form:

[capture list] (parameter list) -> return type { function body }

There is a detailed description of Lambda Expression Syntax on MSDN and CPPReference , I will not explain the syntax of Lambda Expression, I would like to introduce my understanding and usage of Lambda Expression.

As my understanding, Lambda Expression creates an Object of an Unnamed Functor (NOT a Function).

A functor is pretty much just a class which defines the operator(). That lets you create objects which “look like” a function (Stackoverflow):

// this is a functor
struct add_x {
  add_x(int x) : x(x) {}
  int operator()(int y) { return x + y; }

  int x;

// Now you can use it like this:
add_x add42(42); // create an instance of the functor class
int i = add42(8); // and "call" it
assert(i == 50); // and it added 42 to its argument

std::vector<int> in; // assume this contains a bunch of values)
std::vector<int> out;
// Pass a functor to std::transform, which calls the functor on every element
// in the input sequence, and stores the result to the output sequence
std::transform(in.begin(), in.end(), out.begin(), add_x(1));
assert(out[i] == in[i] + 1); // for all i

As we have functor so, why do you need Lambda?

I think one important feature of Lambda is, it can create Anonymous Object, it can be Run On Defined.

Java programmers must be familiar with the code when they create an anonymous class, such expression was not supported in C++.  But Lambda express in C++11 can make a similar achievement. Java Programmers can define an anonymous Thread class:

public class A {
    public static void main(String[] arg)
        new Thread()
            public void run() {


C++ now can directly pass a Lambda express into a function call, cause it just bass an object into that function. The grammar is different from anonymous class in Java:

void fillVector(vector<int>& v)
    // A local static variable.
    static int nextValue = 1;

    // The lambda expression that appears in the following call to
    // the generate function modifies and uses the local static
    // variable nextValue.
    generate(v.begin(), v.end(), [] { return nextValue++; });
    //WARNING: this is not thread-safe and is shown for illustration only

Programmer can directly pass a functor object with the function body expressions into a parameter, the code will run on define.

Reverse bits of a given 32 bits unsigned integer.

For example, given input 43261596 (represented in binary as 00000010100101000001111010011100), return 964176192 (represented in binary as00111001011110000010100101000000).

Follow up:
If this function is called many times, how would you optimize it?

This question is much related with Number of 1 Bits

Solution 1:

class Solution {
    uint32_t reverseBits(uint32_t n)
        uint32_t i;
        uint32_t value = 0;
        for (i = 0; i < 32; ++i)
            uint32_t tmp = (uint32_t)(n & ((uint32_t)1 << (31 - i))) ? 1 : 0;
            value |= tmp << i;
        return value;

Solution 2:

uint32_t reverse(uint32_t x)
    x = ((x >> 1) & 0x55555555u) | ((x & 0x55555555u) << 1);
    x = ((x >> 2) & 0x33333333u) | ((x & 0x33333333u) << 2);
    x = ((x >> 4) & 0x0f0f0f0fu) | ((x & 0x0f0f0f0fu) << 4);
    x = ((x >> 8) & 0x00ff00ffu) | ((x & 0x00ff00ffu) << 8);
    x = ((x >> 16) & 0xffffu) | ((x & 0xffffu) << 16);
    return x;

Write a function that takes an unsigned integer and returns the number of ’1′ bits it has (also known as the Hamming weight).

For example, the 32-bit integer ’11′ has binary representation 00000000000000000000000000001011, so the function should return 3.

The normal solution is mostly like the following:

class Solution {
    int hammingWeight(uint32_t n)
        unsigned int count = 0;
           count += n & 1;
           n >>= 1;
       return count;

 When searching from the stackoverflow, there is an interesting solution:

This is known as the ‘Hamming Weight‘, ‘popcount’ or ‘sideways addition’.

The ‘best’ algorithm really depends on which CPU you are on and what your usage pattern is.

Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. The parallel instructions will almost certainly be fastest, however, the single-instruction algorithms are ‘usually microcoded loops that test a bit per cycle; a log-time algorithm coded in C is often faster’.

A pre-populated table lookup method can be very fast if your CPU has a large cache and/or you are doing lots of these instructions in a tight loop. However it can suffer because of the expense of a ‘cache miss’, where the CPU has to fetch some of the table from main memory.

If you know that your bytes will be mostly 0’s or mostly 1’s then there are very efficient algorithms for these scenarios.

I believe a very good general purpose algorithm is the following, known as ‘parallel’ or ‘variable-precision SWAR algorithm’. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):

int NumberOfSetBits(int i)
     i = i - ((i >> 1) & 0x55555555);
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
     return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;

This is because it has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it.


I am trying to use the C++11 to support the smart pointer, but I find there is no shard_array in <memory>, so I try to use it in this way, and I know this maybe WRONG:

shared_ptr<int> sp(new int[10]);

Then run it, it coredumped as I guessed:

$ smart_ptr/Test_shared_array
Destructing a Foo with x=0
*** Error in `smart_ptr/Test_shared_array': munmap_chunk(): invalid pointer: 0x0000000001d58018 ***
[1]    14128 abort (core dumped)  smart_ptr/Test_shared_array

Use GDB to see more information:

(gdb) run
Starting program: /home/nasacj/projects/woodycxx/smart_ptr/Test_shared_array
Destructing a Foo with x=0
*** Error in `/home/nasacj/projects/woodycxx/smart_ptr/Test_shared_array': munmap_chunk(): invalid pointer: 0x0000000000603018 ***

Program received signal SIGABRT, Aborted.
0x00007ffff7530cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7530cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff75340d8 in __GI_abort () at abort.c:89
#2  0x00007ffff756df24 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff767c6c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff7578c87 in malloc_printerr (action=<optimized out>, str=0x7ffff767ca48 "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:4996
#4  0x0000000000400d9f in _M_release (this=0x603050) at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#5  ~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#6  ~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#7  ~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr.h:93
#8  test () at Test_shared_array.cpp:30
#9  0x0000000000400bc9 in main () at Test_shared_array.cpp:36
(gdb) quit

Then I realize that in Boost, user should provide a deleter to shared_ptr:

Then I find this in stackoverflow:

By default, shared_ptr will call delete on the managed object when no more references remain to it. However, when you allocate using new[] you need to call delete[], and not delete, to free the resource.

In order to correctly use shared_ptr with an array, you must supply a custom deleter.

template< typename T >
struct array_deleter
  void operator ()( T const * p)
    delete[] p;

Create the shared_ptr as follows

std::shared_ptr<int> sp( new int[10], array_deleter<int>() );

Now shared_ptr will correctly call delete[] when destroying the managed object.

With C++11, you can also use a lambda instead of the functor.

std::shared_ptr<int> sp( new int[10], []( int *p ) { delete[] p; } );

Also, unless you actually need to share the managed object, a unique_ptr is better suited for this task, since it has a partial specialization for array types.

std::unique_ptr<int[]> up( new int[10] ); // this will correctly call delete[]

Now there come the shared array STD version in practice:

//#include "shared_array.h"
#include <memory>
#include <iostream>

using namespace std;

struct Foo
    Foo() : x(0) {}
	Foo( int _x ) : x(_x) {}
	~Foo() { std::cout << "Destructing a Foo with x=" << x << "\n"; }
	int x;
	/* ... */

template< typename T >
struct array_deleter
  void operator ()( T const * p)
    delete[] p;

//typedef woodycxx::smart_prt::shared_array<Foo> FooArray;
typedef shared_ptr<Foo> FooArray;

void test()
	FooArray(new Foo[10], array_deleter<Foo>());

int main()
	return 0;

The Output:

$ ./Test_shared_array
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0
Destructing a Foo with x=0

Smart Pointer Programming Techniques

Using incomplete classes for implementation hiding
The “Pimpl” idiom
Using abstract classes for implementation hiding
Preventing delete px.get()
Using a shared_ptr to hold a pointer to an array
Encapsulating allocation details, wrapping factory functions
Using a shared_ptr to hold a pointer to a statically allocated object
Using a shared_ptr to hold a pointer to a COM object
Using a shared_ptr to hold a pointer to an object with an embedded reference count
Using a shared_ptr to hold another shared ownership smart pointer
Obtaining a shared_ptr from a raw pointer
Obtaining a shared_ptr (weak_ptr) to this in a constructor
Obtaining a shared_ptr to this
Using shared_ptr as a smart counted handle
Using shared_ptr to execute code on block exit
Using shared_ptr<void> to hold an arbitrary object
Associating arbitrary data with heterogeneous shared_ptr instances
Using shared_ptr as a CopyConstructible mutex lock
Using shared_ptr to wrap member function calls
Delayed deallocation
Weak pointers to objects not managed by a shared_ptr