A Unit Of Code
A subroutine is a callable unit of code.
It may surprise you to find that not all languages name their subroutines "functions". Pascal - not that anyone writes Pascal anymore - distinguished between "Procedures" and "Functions". The latter always returned a value, the former could not. Other languages, like BASIC, stuck with "subroutine", giving us GOSUB
.
But whatever the name, the key thing is that you can call a function as many times as you like, and from the caller's perspective, it's just like an operator or statement. When it completes, execution picks up where it was called.
Sometimes these functions return a value. Sometimes they accept values - called "parameters" or "arguments".
They usually have a name - a function identifier - but sometimes the name is more complex than just a simple name.
This is a deep dive into functions, how they work, and what to do with them.
The low level
At a low level, in languages like C, something like this happens on a function call:
First, the caller puts the arguments somewhere the function code can find them. Next, it places a hidden argument of where the function was called from - a Program Counter value or equivalent.
Then the actual call occurs, and execution moves from the call site to the function body. Most CPUs actually provide an instruction for this and the later return, which will handle the Program Counter storage for you.
The function then does its stuff, getting the function arguments, processing them, and calculating a return value if any. Then finally, it returns.
The return process is the reverse of the calling process - the return value is placed somewhere, and the Program Counter is restored. Execution then continues from where it left off at the call site.
In general, the place where the function call arguments, return values, and local variables are placed is called a "stack frame". This naturally gives a variable scope for the function, and a clean lifetime for any values created during the function call.
Each call adds a new stack frame to the end, and each return removes it again. In a lot of languages, the program simply terminates once the stack is empty of frames. Too many stack frames will fill the stack and cause a fatal error.
Even where languages don't use actual stack frames, this terminology remains - hence we talk about "the call stack", "stack traces", and so on in all languages.
Call me by my name, oh, call me by my value...
In a language like C, a copy of the variable or expression is placed in the stack frame. This means that any change to the function argument within the function won't propagate back to the caller:
int called(int a) {
a += 2;
return a;
}
void caller() {
int b = 0;
int c = called(b);
c == 2; // c picks up the return value here.
b == 0; // b is left unchanged; we passed a copy.
}
This is known as "call by value".
Because C has reference types - types which hold a reference to some other value, rather than the value itself - we can also pass in the reference by value, giving the function the same reference, and allowing it to use the same value.
int called(int * a) {
// a is a "pointer to int", a reference type.
*a += 2; // "*a" dereferences, reaching the value.
return *a;
}
void caller() {
int b = 0;
int c = called(&b); // Pass a reference to b, not b's value.
c == 2; // As before.
b == 2; // This time, we've changed the value.
}
This behaviour is called "call by reference", and it allows a function to manipulate the values passed into it.
Some languages - including Javascript, Python, and several others - implicitly use reference types in many (or even all) cases. This means you'll always end up with functions able to manipulate the value of objects unexpectedly:
function fn(oo) {
oo.foo = 1;
}
function fn2(ii) {
ii += 2;
return ii;
}
o = {foo: 0};
i = 0;
fn(o); // Implicitly call by reference.
o.foo; // 1, because fn changed it.
fn2(i); // Returns 2
i; // still 0, because primitives are passed by value.
There are other possibilities - Swift has in-out parameters giving you "call by value-result", but in practice these are generally doing "call by reference" underneath so you needn't pay that much attention. "Call by reference" is, of course, really "call by value" with a fake moustache and a reference type, but the distinction is important.
Returning a Value
When a function returns a value, the distinction between returning a value or a reference can be extremely important.
In C, all reference types are explicit, but also the local variables are likely to vanish - returning a reference to a local variable gives you a dangling reference, which will cause some impressive crashes (or worse).
But you can still return a reference to some value that isn't a local one.
In other languages where objects are always referenced, then the language takes care of this for you. Examples here include JavaScript, but also Python and others.
Returning some Values
Usually, you can only return a single value from a function, but there are two solutions to this limitation.
Firstly, you can return some aggregate type. A typical Python idiom is to use a tuple, and then unpack the tuple at the call site, all of which can be done transparently:
def fn() -> Tuple[int, str]:
return 1, 'A string'
i, s = fn()
In other languages, you might need a record type or an array.
JavaScript allows you to do something broadly similar to the Python case with restructuring and other shorthands:
function fn() {
i = 1;
s = 'A string';
return { i, s };
}
const { i, s } = fn();
The alternative is a solution we've already touched upon - call by reference allows the function to provide the results by manipulating the arguments. This is often used by C for this purpose - there's an idiom involving passing reference types to reference types in order to get back a reference to a newly created value:
bool create(int **f) {
*f = (int *)malloc(...); // Allocate memory
// Initialize (*f).
(**f) = 1; // Dereference twice to get to the actual int...
return true;
}
void caller() {
int *f = NULL; // Pointer to nothing.
if (create(&f)) {
(*f) == 1; // True at this point.
}
}
Don't worry too much about the syntax there (and I accept that double-pointers like that are confusing).
While this deliberate manipulation of arguments seems painfully complicated, it's actually very useful, and is how - in practice - most object methods work.
Not Returning Values
Most modern languages have chosen to unify functions and procedures. C did this by having a special non-type, void
, which cannot have any value. A function "returning void" actually returns nothing, and an attempt to assign the return value gives a syntax error.
JavaScript and Python always return a value, however - it's just that it might be a special placeholder value. JavaScript uses undefined
here (both a primitive type and a value), whereas Python uses None
(the sole possible value of the type NoneType
).
The distinction isn't that confusing in practice, but it does mean that in both cases, you can still assign the return value, though it's not likely to be useful - and might be an error.
Naming and signatures
When we call a function, the compiler or interpreter needs to do several things.
First, it needs to find the function declaration. Functions are much like variables - indeed, in many languages they are variables. As such, they are declared somewhere, and in most languages that declaration will also include a definition - in other words, the function's declaration includes the function body containing the actual code. In C and C++, the declaration and definition are usually distinct.
Secondly, in a static typed language, it will need to examine the types involved.
Functions have a return type, and each argument has a type as well - in a dynamic typed language these aren't present.
The arguments you're using, and the way you store the return value, will have to be resolved against the function arguments. In static typed languages, this might result in implicit conversions. Many languages also have optional arguments, which have defaults when omitted.
These details - the types, arguments, defaults and so on - are called the function signature. In a dynamically typed language, the signatures are of course vastly simpler - really, just the name and the "arity", or number of arguments.
Overloading
Some languages provide overloading, where a single function name may have multiple signatures, and the language is free to pick the one that suits best. These are typically picked by name first, then number of arguments, and finally argument types. The obvious exemplar language is C++:
void called(int arg) {
std::cout << "I was called with " << arg << std::endl;
}
void called(std::string const & arg) {
std::cout << "I was called with " << arg << std::endl;
}
void caller() {
called(10);
called("10");
}
called
here has multiple declarations with distinct types, and each declaration also has a definition, or "implementation". If you're seeing a common interface with multiple implementations and thinking "polymorphism", you're not wrong.
Overloading gets a bad rap in some quarters but used well it's amazingly useful - in the code above, we're saving inefficient conversions and adding flexibility for the caller. But if we'd done something entirely different between the two overloads, that'd be very confusing.
Functional languages often allow overloading based on more than just types - certain values, and the "shape" of the data, can be used to overload too.
For example, here's a bit of Erlang which - if I've got this right - will run different implementations of the function depending on whether the array passed in is empty or not, eventually counting the members of the array in a wonderfully pointless and inefficient way:
array_count([]) ->
0;
array_count([ S | R ]) ->
1 + array_count(R).
JavaScript does not do overloading - but with a little effort you can do it yourself using a "dispatch function" pattern:
function caller_number(i) {
console.log("Number variant", i);
}
function caller_string(s) {
console.log("String variant", s);
}
function caller(arg) {
if (typeof arg == 'number') {
return caller_number(arg);
} else {
return caller_string(arg + ''); // Convert to string
}
}
TypeScript does do overloading, but only with the signatures, and not the implementation. To the above, we'd prepend something like:
function caller(arg: string): undefined;
function caller(arg: number): undefined;
But this is not true overloading, just a way to tell TypeScript how to manage the static typing involved.
Operators
Operators are functions, too, of a sort.
In some languages - like C - the operators represent purely mathematical operations which roughly correspond to machine code instructions - they'll never get compiled into calls like a traditional function call. Nevertheless, they possess many of the same attributes as a function.
They have a name, such as +
. They have some arguments, which have types. They return a value, which, too, has a type.
In higher-level languages, they're often heavily overloaded. Look at this JavaScript, for example:
'Hello ' + 'World!'; // Concatenates the strings.
1 + 2; // Adds the numbers.
Some languages, like Python and C++, allow you to write your own special functions which are then used in overload lookup. For example, in C++ we could write:
std::string operator+(std::string const & a, std::string const & b) {
std::string r = a;
r.append(b);
return r;
}
This would then allow two strings to be concatenated just like JavaScript. In fact, C++ has done this for us anyway in the standard library - but unlike JavaScript this is some "ordinary" C++ code in the library (and you can go read it if you like).
Variables
And just as operators can be functions, it turns out that functions can be variables, in turn - or at least, you can keep a function in a variable and pass it around.
In the venerable C, this is done by treating the function name as a variable holding the memory address of the function implementation. The type of the variable is the function signature, sans name.
JavaScript makes this simpler, as do a lot of languages, by having what amounts to a function literal. When we define a function, we're just defining a variable holding the function, a bit like:
const fn = function(a) {
return a * 2;
}
Recent JavaScript has a simplified form (which has a few limitations):
const fn = a => a * 2;
This is particularly helpful for using small anonymous functions as arguments to other functions, like filter
or map
. In these cases, such functions are normally known as "lambda functions", or simply "lambdas". Most modern languages have them, though they often have some limitations.
Functional Programming
Of course, I've managed an entire article on functions and barely mentioned functional programming.
But that's because functional programming isn't about functions as in subroutines, but functions as in lambda calculus. Functional techniques can be (and often should be) used in any language, and modern languages capable of "procedural programming" can comfortably handle most of these.
Summary
Functions are the way we break code down into manageable, and reusable, units. Different languages provide different capabilities, like overloading, and they inherit features like static typing from their variables, too.
A firm idea of how functions work and operate is important - likely if you're reading this you knew a lot of it already, but I hope this has helped settle things a bit.
Top comments (0)