Tuesday, March 1, 2011

Sorting in awk

I became interested in playing around with simple sorting algorithms and learning awk at the same time. Here is a quick roundup of the first three.

$ bash make_stats.sh
Doing 10 rounds of bubble_sort.awk with a dataset of 100 ints
Average real time in seconds: 0.0151

Doing 10 rounds of quick_sort.awk with a dataset of 100 ints
Average real time in seconds: 0.0476

Doing 10 rounds of selection_sort.awk with a dataset of 100 ints
Average real time in seconds: 0.0048

Doing 10 rounds of bubble_sort.awk with a dataset of 500 ints
Average real time in seconds: 0.3663

Doing 10 rounds of quick_sort.awk with a dataset of 500 ints
Average real time in seconds: 0.2443

Doing 10 rounds of selection_sort.awk with a dataset of 500 ints
Average real time in seconds: 0.0795

Doing 10 rounds of bubble_sort.awk with a dataset of 2500 ints
Average real time in seconds: 9.7188

Doing 10 rounds of quick_sort.awk with a dataset of 2500 ints
Average real time in seconds: 0.335

Doing 10 rounds of selection_sort.awk with a dataset of 2500 ints
Average real time in seconds: 2.1556

Divide and conquer rules the roost especially with large datasets.
Selection sort is obviously much faster than bubblesort.
Probable reasons:
  1. Selection sort works on progressively smaller arrays with each iteration
  2. Comparisons in bubble sort are expensive compared to moving an element to the end of the array
  3. My implementation of bubble sort sucks.. (I'll check it again :( )
The make_stats script is in the same repository. It would be nice to update it to graph these stats using gnuplot


Update: Graphing can now be done by graph_performance.sh. It generates a couple of stats files and the graph in tmp. It expects to find gnuplot in your path. Sort size can be changed by editing make_stats.sh directly.

    No comments:

    Post a Comment