Saturday, July 16, 2011

Performance - Functors vs Functions

I recently finished Scott Meyers' book Effective STL. In Item 46 (Consider function objects instead of functions as algorithm parameters) he provides several reasons why one should use function objects (functors) over functions when using algorithms in the STT, the one point that i wanted to test out was the performance aspect. Scott claims that functors perform 50% faster in the worst case and in the best case 160% faster than functions.

Scotts Test Case
His test case involves calling the std::sort() function on a vector of 1 million random doubles. The comparison function used for the sort method is greater() for the functor case and a function that returns left > right for the function case. He describes this at a high level with a little bit of code. Here is my full implementation of his test case:

#include <stdio.h> //printf
#include <iostream> //cout, endl
#include <time.h> //timespec, clock_gettime(), CLOCK_PROCESS_CPUTIME_ID
#include <vector> //vector
#include <algorithm> //generate, sort
#include <functional> //binary_function, greater

using namespace std;

class RandomDoubleGenerator {
public:
    double operator()() {
        return (double)rand()/(double)rand();
    }
};

class TimespecAverage : public unary_function<struct timespec, void> {
public:
    TimespecAverage() : numTimespecs(0) {
    }
    
    void operator()(struct timespec t) {
        ++numTimespecs;
        sum += t.tv_sec;
        sum += t.tv_nsec * 0.000000001; // nano is 1x10^-9 seconds
    }
    
    double average() const {
        return sum / numTimespecs;
    }
    
private:
    double numTimespecs;
    double sum;
};

timespec diff(struct timespec start, struct timespec end) {
    struct timespec diff;
    if ((end.tv_nsec - start.tv_nsec) < 0) { //compensate for something like: start - 12.9, end 13.1
        diff.tv_sec = end.tv_sec - start.tv_sec - 1;
        diff.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
    } else {
        diff.tv_sec = end.tv_sec - start.tv_sec;
        diff.tv_nsec = end.tv_nsec - start.tv_nsec;
    }
    return diff;
}

timespec time(void (*functionToTime)(vector<double>&), vector<double>& data) {
    struct timespec start;
    struct timespec stop;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
    functionToTime(data);
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop);
    return diff(start, stop);
}

void printElapsed(struct timespec functor, struct timespec function) {
    printf("%li.%.9li    %li.%.9li \n", functor.tv_sec, functor.tv_nsec,
            function.tv_sec, function.tv_nsec);
}

void printAverages(vector<timespec>& functorTimes, vector<timespec>& functionTimes) {
    TimespecAverage functorAverage = for_each(functorTimes.begin(), functorTimes.end(), TimespecAverage());
    TimespecAverage functionAverage = for_each(functionTimes.begin(), functionTimes.end(), TimespecAverage());
    cout << endl << "Average" << endl;
    printf("%1.9f    %1.9f \n", functorAverage.average(), functionAverage.average());
}

bool greaterThan(double left, double right) {
    return left > right;
}

void sortFunctor(vector<double>& data) {
    sort(data.begin(), data.end(), greater<double>());
}

void sortFunction(vector<double>& data) {
    sort(data.begin(), data.end(), greaterThan);
}

int main(int argc, char** argv) {
    const int LOOPS = 10;
    const int NUM_DATA = 10000000; //10million
    
    vector<timespec> functorTimes; 
    vector<timespec> functionTimes;
    functorTimes.reserve(LOOPS);
    functionTimes.reserve(LOOPS);

    /* timer */
    cout << "Functor        Function" << endl;
    for (int i = 0; i < LOOPS; ++i) {
        vector<double> functorData(NUM_DATA);
        generate(functorData.begin(), functorData.end(), RandomDoubleGenerator());
        //create a copy of our random data vector for use with our function sort()
        //this way both methods are using the exact same data set
        vector<double> functionData(functorData);

        /* functor */
        struct timespec functorElapsed = time(sortFunctor, functorData);
        functorTimes.push_back(functorElapsed);
        
        /* function */        
        struct timespec functionElapsed = time(sortFunction, functionData);
        functionTimes.push_back(functionElapsed);
        
        printElapsed(functorElapsed, functionElapsed);
    }
    
    printAverages(functorTimes, functionTimes);    
    return 0;
}

Worst Case
I compiled this code using g++ 4.5.2 and the following compiler options (no optimizations):
g++ -lrt -ansi -pedantic -g -Wall

Below are my results (in seconds):

Functor        Function
7.439764495    7.755845935 
7.735696621    7.713312266 
7.587360216    7.501062211 
7.576613316    8.058204506 
7.539563825    7.602603278 
7.471405721    7.776704292 
7.750183958    7.940997362 
7.721883637    7.679431332 
7.470727771    7.681166662 
7.588803904    7.776486039 


Average
7.588200346    7.748581388

As you can see the function and functor sort times are almost equal (functors are only 2% faster) and do not line up with the 50% worst case that Scott claims.

Best Case
I compiled the code again this time turning on full optimizations:
g++ -lrt -ansi -pedantic -O3 -Wall

Below are my results (in seconds):
Functor        Function
1.530773436    2.230420939
1.487986836    2.240578317
1.576760542    2.280175380
1.521323717    2.334807950
1.499050688    2.203657561
1.518878620    2.272559906
1.581513080    2.212788678
1.563577081    2.393493630
1.616584657    2.313904654
1.581532295    2.291148088

Average
1.547798095    2.277353510


As you can see the call to std::sort() where you pass a regular function as the comparator is on average 0.729555415 seconds (47.1%) slower than the call to std::sort() using the functor as the comparator. This fairly close to what Scott claimed in his book as the worst case.

As Scott explains in his book, the reason for the performance increase is due to inlining. The compiler is able to inline the comparator functor but it is unable to inline the comparator function. This is because when you give the comparator function to sort() you are really giving a pointer to that function and the compiler is unable to inline the invokation of the function pointer. Where in the functor case the compiler will happily inline the operater() function in the functor and since you are passing the functor by value to sort() there is no pointer dereferencing.

Conclusion
I was unable to produce Scott's best case where functors performed 160% faster than their function counterparts, however i was able to produce an increase of 50% by simply using a functor over a function. Either way, there is no doubt that functors do produce more performant code than their function counterparts, but compiler optimizations need to be turned on in order to notice any real difference.

No comments:

Post a Comment