Split, Apply, Merge in D

#dlang #tutorial

I wanted to find Groupby, a means to iterate a list in groups (lists of lists). In that search I came across this article about split, apply, merge for datatables. This looked like what I wanted, but it being specific to data science had me confused.

In D these function are chunkBy, map, joiner. The pattern of consistency continues as we just need to specify what to group on, once our list is sorted.

import std.algorithm;

auto data = [1,1,2,2];
assert(data.chunkBy!((a, b) => a==b)
           .equal!equal([[1,1],[2,2]));

Unlike previous lambdas, this one is taking two arguments, this allows for elements to be grouped in interesting ways.

import std.algorithm;

auto data = [1,1,2,2,3,3];
auto evenGrouping(int a, int b) {
    if(a%2 == b%2)
        return a < b;
    return a%2 < b%2;
} 

assert(data.sort!evenGrouping
           .chunkBy!((a,b) => a%2==b%2)
           .equal([[2,2],[1,1,3,3]]));

As mentioned sorting needs to happen first.

import std.algorithm;
import std.range;

auto data = [3,3,1,1,2,2];

assert(data.sort!((a, b) => a%2 < b%2) 
           .chunkBy!((a,b) => a%2==b%2)
           .map!(x => x.array.sort)
           .equal!equal([[2,2],[1,1,3,3]]));

In this contrived example I decided it best to run it through a compiler. It was a good thing as I found a difference in behavior. I'll save map for another day.

Two types of lambda functions are supplied to these functions. One takes a single argument which gets referred to as unary predicate and one that takes two which gets referred to as binary predicate.

When a unary predicate is supplied to chunkBy it returns a tuple of the quality found and the value. This is an interesting optimization but this overload should live with group which already has this behavior.

DEV Community

Split, Apply, Merge in D

Top comments (0)

Read next

Day 3: What is Docker and why should I care?

Part 2: Defining the Authentication gRPC Interface

Scripting: After Effects Projects and Compositions

Linking Data in PostgreSQL Without Explicit Keys: A Practical Guide