In the last post we looked at how we could make Array
more expressive. By which I mean we added a way for people to extend the data structure to support more use cases. But while third parties now can add missing functionality, there are certain functions which should be supported out of the box, both for performance and convenience reasons.
Elm has four built-in collection types: List
, Array
, Dict
and Set
. With the exception of Dict
, these types look similar from a birds eye view. You can insert an item into the collection, get it back out, and even perform some task for every item stored. Of course, they do have different trade-offs which make them suitable for different situations.
The List
is great when adding and retrieving items at the beginning (left side) of the collection. Array
is great when you need fast access to any element, and Set
is good when you need a sorted collection with no duplicates.
The API for these collections are different as well. Sometimes this makes sense, but not always. For instance, there is no sort
function for Set
, which is reasonable as Set
is a sorted collection. However, there also isn't a sort
function for Array
, which means that you have to convert it to a List
or a Set
and then back again if you want it sorted.
Converting from one collection type to another is a costly operation. If you use an Array
because it suits your use case most of the time, you don't want to convert to another collection just to perform that one operation.
Let's look at how we can improve the API of List
, Array
and Set
to reduce the need for converting from one collection type to another to support the functionality you need.
Differences which should remain
Certain operations aren't supported on a collection type with good reason. For instance, there is not get
or set
operation on List
because the performance would be bad. If one need to get
or set
a random item in a collection, Array
should be used. Likewise, union
, intersect
and diff
should only be supported for Set
. Here are some functions that are not supported by all three collection types, which I believe shouldn't be supported either:
get
, set
, insert
, remove
, ::
and push
can only be supported efficiently by their respective collections. If you're in need of one of these functions, it's a tell-tale sign of which collection you should use. However, with the improvements I've mentioned in earlier posts, Array
could support all of these functions efficiently, which is yet another reason as to why Array
should be the default collection type.
indexedMap
and toIndexedList
make no sense for Set
, as it doesn't operate by indexes and doesn't retain insertion order. repeat
doesn't make much sense for Set
either, as it cannot contain duplicates.
union
, intersect
and diff
can only be efficiently implemented by Set
.
head
can be supported efficiently by the other collections, but the operation itself really only makes sense for List
. Array
would use get 0
, and it isn't quite clear what head
means in the case of Sets
, is it the first/lowest/minimal element or the root/middle element?
sort
, sortBy
and sortWith
doesn't make sense for Set
, but Array
should definitely get these functions.
reverse
can be implemented efficiently for Array
, but makes little sense for Set
.
Where we could bridge the gap
Below is a list of functions which can be easily supported across all three collections:
- singleton
- initialize
- range
- concat
- intersperse
- member
- map2, map3, map4, map5
- concatMap
- filterMap
- partition
- take
- drop
- unzip
- slice
- sum
- product
- maximum
- minimum
- all
- any
- scanl
While these functions could be supported across all three collections, does that mean they should? Since we're already messing about, let's ask ourselves if there are certain functions that shouldn't be in core to begin with.
sum
and product
are essentially shortcuts for List.foldl (+) 0
and List.foldl (*) 1
. Are those operations common enough to grant them dedicated functions in the core library? If we introduce slice
for all three collections, do we need take
and drop
?
Do people actually use map4
and map5
? How about scanl
and unzip
? I mean, there probably are, but are those operations so common that one would expect to find them in core? Not that it necessarily does any harm, but there is something to be said of having a small, but powerful, package where you can read the entire documentation in a short sitting.
Anyway, here's a list of which functions I, personally, see no reason for having in core. I expect people will tell me that these functions definitely belong, but I think it would be nice to at least have the discussion.
- intersperse
- map4
- map5
- concatMap
- unzip
- sum
- product
- scanl
- take
- drop
Finishing touches
Here are some minor things I think shouldn't be left out of a API renovation:
List
and Array
has a length
function, Set
has size
. This is likely due to the fact that Set
is a thin abstraction on top of Dict
and so has just copied the name. I would rename Set.size
to Set.length
out of consistency if nothing else.
Now that I'm on the topic, since Set
is thin abstraction on top of Dict
, maybe everything discussed up til this point applies to Dict
as well?
List
has ::
and tail
. The analogous for Array
is push
and pop
. pop
is missing from the API, so I would add it.
I would add find
for all three collections. List
and Set
already supports member
, which is essentially find
except that it returns a boolean instead of the value for which the predicate function returns true. It's a more useful function in general, and so if member
should be a part of the API (which I believe it should) then find
belongs as well.
For Set
and Dict
I would throw in min
and max
to retrieve the minimum/first and maximum/last element.
I think Stack
is a better name than List
and so would rename that type. It's a name that more clearly explains what that collection type is good at.
Finally, as mentioned in the previous post, I think stoppableFoldl
and stoppableFoldr
, or something with the same functionality but with better names, should be added to the API of Array
. I think Dict
and Set
could be well served with that functionality as well.
What do you think?
The goal I'm trying to reach is to make the Array
type more general purpose, but when improving the Array
API, it makes sense to look at the other collection types as well. This article lays out what I believe those APIs should look like.
First, I want to say that I'm not authority on what makes it into Elm. If I actually code up the suggestions in this article and submit it as a proposal, all of it could make it in, or none of it.
Second, all these suggestions are based on my own experience. What I need is your experience. Does these suggestions make sense, or am I off my rockers? Let me know in the discussion thread on the Elm Discourse.
And now, for something quite different
So, is this the end of this series? Not quite. There's a summary left the be published, but that will have to wait until after we go over the top (you didn't think I'd forgot, did you?) in the next post, which is about the literal representation of List
and Array
.
Top comments (6)
Functional languages tend to get the distinction between arrays and lists right because List is generally understood to mean "linked list" and array is generally understood to mean "array-backed list." They also tend not to expose random-access methods on linked lists, unlike Java which is happy to provide this particular foot-gun. In other words, I feel like imperative/OO languages should change their nomenclature, not functional languages.
You're talking about my interest in renaming
List
toStack
?Generally understood amongst whom? Most people who come to Elm come from JS. Depending on what other language they have experience with,
List
is a very ambiguous term. In python and C#,List
actually refers toArray
. Also, when a JS developer sees[1, 2, 3]
it's reasonable to expectList
to work as anArray
.Stack
is not only more descriptive of what performance one can expect, it also avoids confusion. WhileList
often means linked list in other functional languages, most people new to Elm have no experience with other functional languages, and so they are more likely to expect thatList
works like anArray
.Yep.
I probably could have phrased it better: Those who work in functional languages tend to have a less ambiguous understanding of what a list is, and are unlikely to conflate linked lists with arrays or array-backed lists.
A more pragmatic approach to introducing people to functional languages is to make this distinction clear to them from the start, as opposed to introducing ambiguous terminology in the one paradigm where little ambiguity exists for these terms.
Funnily enough it is called ArrayList in C#1.0.
That type is still there, although most will now opt to use the newer List<> which is a generic type.
Once everyone is conditioned to think "A list is an array" it's OK for Microsoft to shorten the name in the newer type.
In Elm LinkedList would be a good type name to avoid confusion.
To me it doesn't feel like a stack. I mean technically it is. But then what's with the (++) operation? That's a weird operator for stacks, right? It feels weird to use any infix and would rather have definitions like this:
push : a -> Stack a -> Stack a
peek : Stack a -> Maybe a
pop : Stack a -> (a, Stack a)
You’re right, (++) is not an efficient operation for a single-linked-list, and so one could make a (good, in my oppinion) argument for it’s removal.
Thing is, Haskell’s
List
also has (++), and Elm hasn’t had a stableArray
implementation (it gets one in 0.19).If my proposal is accepted, it would make sense to remove bad operations from
List
.Interesting topic, thanks for the post