(Headline photo from nixcraft's post to which I was reacting below)
TLDR:
- Shell scripting is real programming
- If you want to write shell script learn it properly
- Use programming best practices like you would with other languages
- Source-control it
- It's weirdly idiosyncratic and needs extra attention
- If the above doesn't suit you, you'll be better served by a less idiosyncratic language
So the below was a rant I posted in response to some pushback - someone suggested using Python instead of bash, and a few people complained about how it's overkill, how there are two versions of Python you need to get right, or you have to get it onto the machines in the first place, or suchlike.
I love shell scripting, and I still use it a lot, but I'm no fool. There are so many issues with it that blindly defending it for all use cases is foolhardy.
Not least because in company settings, most other people you will work with haven't the slightest idea what they're doing with shell scripts. They already get brownie points for thinking of putting the scripts in source control, kind a like getting grades for writing your name on your test.
SHELL SCRIPTING IS REAL PROGRAMMING. It should be source controlled, code reviewed and written to clean, maintainable standards. Because that code is meant for production.
So this is what I retorted:
If it's a personal machine, install Python - or other language of choice.
If it's the enterprise machines, have a policy to ensure Python - or other language of choice.
Shell is good and great, I'm a command line junkie myself, and I still turn to shell scripts for a lot of my work, but a shell script that's for more than wrapping a long pipe or two quickly becomes madness unless you actually put in the time and effort to learn the language PROPERLY. (The article feature pic is an example of utterly shitty code and their problem is not the filename spaces YET, but their RUBBISH handling of variables, and the lack of any effort to write cleanly)
I am an extensive bash scripter, and became so historically only for the reasons listed around here, which I once saw as valid reasons. They can all be worked around - and the overhead of solving "language X not on our machine farm" is often better return on effort than the years of unmaintainable brittle shell scripts you've been writing dozens of and never maintaining, or your colleagues cannot fathom and won't touch for love nor money. Unfoathomable shell scripts that run the backbones of deployments, builds, farm management and more - the backbones of many a company.
I've seen bash scripts by seasoned developers, and those are also utter trash. Your code may be cleaner than the average, but that's an extremely low bar.
Shell scripting is good and powerful in its own right, it is true, and I have advocated that people give it a proper try and actually learn it, but the sad reality is that nobody does. If it's any "proper" language, they learn the ins and outs gladly, by peer expectation or inherited bias; but shell, even if you learn it properly, someone else will come f/ck up your clean code because they can't be arsed.
I am endlessly pushing for developers and admins to actually learn to use bash/shell properly, make use of functions, encapsulate steps of logic, write clean code. "It's just a shell script," "that's overkill," "it's fine like this," "I don't want to sink time into learning this, it's not a real language anyway."
The other truth is that, in the sysadmins space, most can't even write clean Python/Ruby/Perl/PHP/JavaScript/chosen-lang either, and given the number of gotchas and things shell lets you get away with until you hit a catastrophic bug (of the coder's carelessness, the misunderstood shell behaviour is documented) (Steam bug anyone?), they'd be better off in a safer environment than shell scripting.
Shell scripting perennial issues that are not the fault of the coder:
- will gladly let you get away by default with undefined variables (unless you explicitly
set -ue
) - comparison vs assignment is the only place where space matters (
a=b
anda = b
are COMPLETELY different statements, wtf) - you cannot return arrays from functions, only a stream of text (the power and the Achille's Heel) (this one issue compounds many of the others, by preventing workaround functions from being written)
- arrays cannot be passed down to functions as distinct items alongside other arguments (you can use references as a way around, but how many bash scripters know those?) (easier to use global variables right? yuck)
- variables are global by default. unless you make your iteration counter local, you stand to see some weeeeird bugs
- string splitting is done around an inherent part of the string, not as a a function operating on it (do we all know about IFS, does everybody know how to use it? didn't think so)
- Is it really the shell you thought it you were running? Ever deployed bash scripts only to find that the only interpreter on the machine is
sh
? Or the environment forces you intosh
by default? Or that in fact you're not runningbash
butash
? Or maybe the system default isdash
. Anyhow, you have to write everything now in plainsh
and lose any improvements thatbash
ever brought that make the task more bearable. - Inconsistent environments for common commands are rife. Your script uses the "mail" function? GNU or BSD, which options to use? You use
netcat
? Which variant, which options? You usetar
,grep
,rsync
? You using GNU, BSD or Busybox implementations? (these variations happen endlessly when mixing Ubuntu, CentOS and Alpine deployments, and that's just the surface) - (I scoff at any pushback of "ensuring the right version of Python on the company systems")
- attempting anything remotely event-driven yields a nasty pile of workarounds (I've tried, with muted success) Granted, this is a space which shell is definitely not designed for, but that's to say how far I tried to do everything in shell at one point. It's possible, but it's damn hard work where another language would have been better.
Most fundamentally, the view of shell as "not a proper language" hampers any impetus at large to learn it correctly and extensively, and understand its own idiosyncracies. At least with one of the other languages, developers have an inherited mindset that their skill in that language needs continual improvement, and will work towards this.
I still write tons of bash scripting. I love it. But recommending other people use another language is much more sane. Personally, I chose Python too. But in the end, I wouldn't recommend it unless you are going to do your darndest to learn. It. Properly.
Top comments (68)
I saw once a Go library designed to mimic a UNIX shell and some of UNIX utilities. The idea was to use Go for what would traditionally be shell scripts.
The main two strengths of a UNIX shell are effortless pipes (try that with Python!) and external command execution as a first-class citizen.
Most modern languages allow function composition (pipes between functions). And they have extensive, stable repositories of libraries.
This boils down to shipping the right packages with the distribution - something Linux distro's already have, of course.
Python's implementation support for pipes is reasonable and not that hard to use, but it's painful compared to actually using pipes directly in bash/tcsh/etc. Even using FIFOs on the command line feels more natural than the way you have to compose them in Python. The Unix way of composing operations is probably the best implementation of functional/stream programming ever.
And I am a Python lover, so don't think I'm knocking Python!
Absolutely right if you're going in and out of Python to create a pipeline of full fledged processes (I suppose you refer to the
subprocess
module and alike).What I meant is: you can stay in the environment that supports 'tacit/point-free programming', and make sure you have everything you need:
Could be just as easy:
provided you have these functions lying around somewhere.
Granted: Python has these FP concepts built-in, but not as nicely as the unix way. There are better languages for that: Haskell, F#, erlang...
Interestingly enough, someone already thought up a Haskell shell: Turtle
Yes, I was talking about composing processes like you would at the shell.
I see what you're saying about point-free programming, though. I still think the Unix style is the cleanest, most natural implementation of point-free programming, and I think the fact that it is a genuine stream of processing is a big point in its camp. However, if your Haskell example is accurate, I like it. The examples the Wikipedia article give seem less intuitive and a lot more LISP-y.
I think most programmers would probably find the use of
compose
in Python a lot less intuitive than nested generator functions, and it's certainly an inelegant implementation of point-free programming. I also wonder if it can eliminate some of the advantages of the generators? It probably doesn't based on the sample implementation, but I'd have to think carefully about if applyingpartial
like that would have unintended consequences, at least in some cases.I think so, too. You can make a nice 'fluent' DSL out of it, though.
..... ingenious.
I'm still trying to brain this, its possibilities and its limitations but... wow.
A bit of commentary would be very welcome :-)
I'll expand it in a full fledged post :) Or 'leave it as an exercise'?
I either love this or hate it, I can't decide. Bravo, sir!
Somehow, I have hit the 'publish' button on dev.to/xtofl/i-want-my-bash-pipe-34i2.
I've only skimmed the article so far, but it looks like a good one. I like the title! 😁
@xtofl , @Cliff , I have finally gotten round to this, I think you will be gleefully dismayed.
dev.to/taikedz/shellpipe-shellpipe...
Yeah, doing pipes is a nightmare in anything other than shells, and the reason why I still love using shell scripts - chaining tools.
But there's a lot of logic that can often be sub-moduled out to other languages to make a cohesive whole. Some systems people don't like the idea of a program not being self-contained in a single file and you end up with 2000+ lines of sub-optimal code at best, more often than not downright horrendous though... take a peek at the install scripts of some of your favorite software to see what I mean...
In regards to having to choose between 2 versions of python, no not anymore. 2.7 is end of live and at minimum everyone should be using 3.6+. But you should be on the latest stable, currently 3.8
Yeah, but legacy scripts are totally a thing. Some outfits don't even know you can have both installed side-by-side so you can do a progressive script migration. So they did no migration. And they mandate all Python2 even now. It's not pretty. Doing my best on my side to further educate my colleagues, but that's just my corner...
Yea legacy is a different thing I agree. But we should all be encouraging to get people to upgrade.
As an Oracle Engineer for a large coporation 95% of the automation I write I use ksh, no python3 in RHEL 7 where I work and old legacy hosts dont even have bash so ksh works everywhere. So much of the automation my team writes is all ksh for a whole host of tasks and while that last 5% is a bit of Python I can't see it dynamic changing anytime soon as SHELL scripting is so imbedded into the way the team and the business functions.
Once a technology is adopted it's hard to unstick it... That being said in your case it sounds like everybody gets the training required to write clean ksh? So long as they've learnt ksh fully and not "just the easy bits," then my argument remains the same: so long as there is impetus and requirement to learn the language properly, there's no (or much less of a) problem!
Bash is real programming. I wrote tons of modular complex logics and functions with just Bash running on top of many critical backbone servers. Bash is a beautiful language to me. Once come to the Linux environment on Server or IoT platforms, Bash is a mandatory language. If I need my program doing floating-point maths. I'll use C or C++ to create a tiny efficient engine with added APIs for the Bash script to sit on top of it. You know... Bash loves piggyback anything. It's cute.
Yep it's real programming alright. That said, I've seen many cases (and been guilty of a few) where the logic would have been better moved to other languages, in their own succinct modules, and used bash to tie the pieces together. It all depends on the use case of course :-)
I usually start with a bash script. Because it's just easy. Then I need to do something too complicated like concatenate a string or use arrays, hashmaps etc and I end up rewriting the entire thing in python.
I'm sure there are many people that are amazing at writing bash scripts but it's never been very readable. Once you get just a bit complicated languages like PERL are starting to look pretty by comparison.
Also my biggest petpeave bash sed and awk are not compatible between Linux, mac and likely other variants.
No there are only a handful :-D (I'm only half kidding)
My biggest gripe with shell scripting is the tooling dependency.
I used to quip "in bash, ANY language is your library!". The obverse of course is true: any tool can subsume any other in any given environment, and you won;t know til your script crashes.
macOS uses the BSD Utils by default, as do the BSDs in general ; Fedora uses GNU Coreutils, except when they use a BSD adaptation ; Ubuntu is GNU Coreutils most of the time; Alpine uses BusyBox (a great tool in and of itself, but a thorn for cross-platform shell scripting) ; ...
it's just limited. It makes it harder to write tests and follow most coding patterns.
Granted there are tools like this: github.com/sstephenson/bats but not sure if anyone uses them. Also.. Libraries!! How many times do we need to re-write the same fix to the same problem.
So.. you're saying that it's a very repeatable and consistent ecosystem? O.o
Yeah that's part of the issue. It's easy, it's everywhere just run bash foobar.sh, except when it doesn't work and you have to write 7 versions to support all the various edge cases.
It's not as easy to write, but I'm really liking go. It's way more complicated and verbose than bash, but at the end of the day i end up with 1 file to copy around.
Yeah libraires... is why I started my bash-builder project and its sibling bash-libs. Build the script and have... a single file to copy around ;-)
The backbone of most of my bash scripting nowadays...
"you cannot return arrays from functions, only a stream of text"
Never tell an engineer they can't do something. Please see:
dev.to/jrbrtsn/returning-an-array-...
That's not "returning" an array, but passing a reference to be manipulated in-place :-)
Still, a useful way of passing arrays down/up. You just.... need to know the technique... forget about recursive functions though. The name in the receiving function must be different from the one in the caller function.
Passing a result buffer by reference into a function is exactly what C++ does under the hood to present the illusion that functions may return something too large to fit in a CPU register. Recursive functions in Bash are a bit trickier, but I'll post a solution to that later today ;-)
I dunno, last I heard (and I am not versed in C so I could be talking complete nonesense), C functions return memory addresses... and of course, it is up to the programmer to know what any one function is passing back, be it an actual int, or a reference to a data structure of any kind...
I look forward to whatever solution you have to enable getting around the name clash during recursion with reference variables :-)
Tai,
C functions can only "return" something which can fit in a CPU register, because that is an efficient place to stash information which can be retrieved after the function "returns" - meaning the content of the CPU register gets copied into a stack variable or some other place useful to the programmer. C++ presents the illusion of returning something larger than a CPU register by silently creating space on the stack (a return buffer, if you will) before the function gets called, and then silently passing a reference to this return buffer into the function where it will get populated. Semantic shenanigans if you ask me.
That is wrong indeed.
C functions return typed values. Indeed once the structures grow bigger, programmers tend to revert to output arguments, which are often pointers to structures. Also, for lack of exceptions in C, functions often return error codes, and accept 'output' arguments as well.
In C++, types are much more used to advance the correctness of the program. Exceptions, variants, ... all make this a lot easier.
But... you can pass mere
void*
around. Severe pain should be your punishment - though there are legitimate cases for it.xtofl - you may wish to review the assembly code produced by a function call in C before declaring that I have made an error ;-)
I'm sure the assembler does emit low level stuff like that. The semantics of the language are copying a value out of the function, though.
In C the semantics what a function can return are limited to something which which will fit in a CPU register because that is what the term return means. The C compiler provides an additional service of promoting or truncating this returned value to match a simple return type. You can't successfully return any arbitrary local-to-the-called-function stack based struct from a C function, because the content of this struct is undefined as soon as the function returns. You can pass the address of any arbitrary struct into a C function, where it may then get populated.
In other words, for C there is a bright line of distinction found in what may be returned, while in C++ this line is syntactically blurred with some compiler tricks.
I am very confused now. It's off topic here, but what does 6.9.1/3 of n1256 mean, then?
And then there is paragraph 12...
Maybe I should stick to C++ then.
If you don't care about the assembly generated by your compiler, stick with C++ is a good choice. That said, I picked up C 32 years ago, and it remains one of the most popular programming languages. I am roughly 5x as productive in C as C++, and I've been programming in C++ for 27 years. It's getting worse with each subsequent C++ standard.
Not debating language superiority.
But I'm truly dazzled you would not be allowed to return a perfectly valid object. Let the compiler inject the correct assembly to accomodate it.
What about this one stackoverflow.com/a/9653083/6610? Wrong, too?
The stackoverflow example returns a value which is the size of an int. That fits just dandy into a CPU register. One of the focuses of the C++ standards committee has been trying to squeeze out all the superfluous copying which has been implemented to support clear semantics - so there's your reason.
Sorry, I keep finding counterarguments to your statement, unless I misunderstood.
This article clarifies some things, too: uninformativ.de/blog/postings/2020....
Could you be talking about 'early versions' of C?
I'll try compiler explorer tomorrow - now's nap time here.
From the Wikipedia page:
There's your compiler trick of silently allocating stack space, and then passing it by reference to the function to get populated. After the function returns, C++ compilers historically have then copied the result from the invisible-to-the-programmer return buffer into something the programmer knows about. For large objects this is a significant performance penalty to pay for some syntactic sugar.
Can you put a date on that? "Historically", that must mean pre-2003: since then, every major C++ compiler does copy elision. C++ has wildly drifted away from what C used to be, to follow its credo "don't pay for what you don't use".
But thanks. It is indeed very interesting to see how you're drawing totally opposite conclusions depending on your viewpoint. Here's a (reasonably) honest analysis of the schysm between C and C++, by someone in the standardization world: cor3ntin.github.io/posts/c/index.html
struct
returningI conclude that
Btw. with godbolt.org/z/a4exrM, you can compare the assembly generated by a large number of compilers.
I'll concede that my understanding of C++ compilers is dated - in 2003 I had already been coding in C++ for 10 years.
The crux of this disagreement is the definition of the word "return". In my opinion, creating the semantics of returning an object by silently passing in the reference to a possibly significantly sized return buffer of which the programmer may or may not be aware is not "returning" an object at all. Passing return buffers to a function by reference has been possible and clear since 1972, but I still cannot write in C or C++:
However, I have always been able to write:
So, is the second example "returning" 3 objects? If not, then why?
By implementing the semantic charade of returning a single arbitrary object, modern compilers have accommodated a single case where it appears in source code that a function can return something which will not fit in CPU registers:
So, where (stack|heap|static data segment) does rtn exist? How does the contents of rtn make it back to the caller? How is this not superfluous copying?
I contend that the following code is much clearer and guaranteed to be at least as efficient :
"Returning" is indeed an abstract concept. Each platform makes it concrete in its own way. To the CPU, there's no such thing as 'returning'. I don't think opinions should matter here; it's always a choice.
To me, returning a tuple is fare more readable than mixing input- and output arguments of a function. Modern languages accomodate this, and together with 'destructuring', it leads to code that most developers can readily understand.
As an application programmer (less so as a driver implementer or kernel hacker), I value this s expressiveness in a language. Any compiler that does not know this 'magic' forces me to work around it.
Hey, as an aside - cor3ntin wrote a nice overview of what divides C and C++ worlds: cor3ntin.github.io/posts/c/. He shouldn't have called it 'The Problem", but I like his analysis, which goes way beyond the technical.
I had already read the article, thanks for sharing. "Expressiveness" is an entirely subjective term which merits little discussion. In my opinion, any pointer passed in which is not marked const refers to a return buffer. If you study libc's prototypes, you can see the established convention of passing in the address of the return buffer(s) first.
Back in 1993 I would have asserted confidently that C++ will overtake C in a decade or so. Live and learn.
I have noticed that in most circles I have communicate with (I started around 2000), abstractions are welcomed (within limits), while in the embedded domain, the feel with the hardware is appreciated more. There seems to be a scale ranging from bare metal to functional programming.
It would be an interesting exercise to see which concepts make code more expressive for you, and which for me, and if there are certain 'clusters' of developers that benefit from the same kind of style. Probably there are some studies about it already - I'll have to look around.
I support you, I love bashScript is my favorite code language, it is very powerful, in my begin with bash, my due to my big ego I did it my bashScripts complex, you know, to prove that I knew coding with bash, right now I always try to do my code very, very simple, for anyone can understand it easily, I like a lot the one liners and use the
test
[[ ]]
and&&
for the flow control, in one line, but that do it more complex of read my code, so the best is use indent😁 and do your code of the most simplest way😁👍✌️I use bash day in and day out. It's my favorite shell. I've been using it for a long time.
But... whenever I need to do something for production, I reach for Python 3.x. I use git for source control. I have my code peer reviewed.
I've converted some other peoples bash scripts into Python. I've converted some other peoples Perl scripts into Python. I've converted some other peoples JavaScript on Node.js into Python.
Because I love Python? (Well, I am fond of it, true.) No, because the other code was hard to understand and hard to maintain and hadn't been code reviewed and wasn't abiding by the approved scripting engines. (Node.js is actually approved, but the other points hold.)
The "hard" part isn't a shortcoming of the language (after all, these aren't PHP), it's a shortcoming of the programmer making a tasty pot of spaghetti code -- and spaghetti code can be written in any language.
Yes, but some languages (and their baggage) can be more conducive to spaghetti ;-)
If you look for examples of good Python, Java, JavaScript, C, Golang, etc, you can find them, and there are LOTS of people trying to demonstrate how to do it properly. Examples are everywhere.
Shell (and to a lesser extent Perl) is plentiful in the wild - and it is mostly the awful stuff that is most readily available. (this was my point about peer expectations to improve skills in some languages but not others).
If you did the conversions in a company, and everyone else can write clean Python then great :) Although perhaps getting people to write clean lang-x in the first place would have been just as productive. I speak from experience when I say trying to make people learn and write clean shell is a pain.
That is a valid counterpoint, and I concur. Good code can be written in any language, except PHP of course. (I would have said PHP or Perl, but I've actually seen good code in Perl, so I know it is actually possible.)
I really can't say I find much to disagree with here.
If I can think of how to do a task 80-90% of the way on the command line without much head-scratching, I almost immediately reach for Bash. If it's a clearly OS- or file-system-centric or especially stream-oriented task, same. But as soon as it seems like there's a lot of in-place mutation needed and certainly if I feel like an array/list/hashmap will be necessary -- arrays are such a pain in Bash -- then I reach for Python. If it seems like it's going to be complex with optional arguments or different types of runs, straight to Python. As soon as what was a straight-forward Bash script starts getting requests to do more stuff, rewrite it in Python.
I once write a 650+ line Bash script -- including the generous comments and formatting -- to generate reports to streamline security auditing on a network of user systems and servers, because one of the sys admins didn't know Python and said it would be easier if they ever needed to modify it. It was solid, effective, fast, and even configurable with the ability to add new checks using regular expressions. But that was the point at which I said "never again" for something like that. The last I heard, it's still in use a decade later and has hardly been touched except for a little tweak here or there due to some network changes. They actually use it to vet any new tool they bring in. So I'm pretty proud of that, even if I'd never write such a monstrosity now.
I know the pain of such pride :-)
I did a couple projects where a fair bit from one was being re-used in the other (generic stuff, like output control, string validations etc) and I started putting bits into individual files etc and eventually ended up putting together a build tool that allows using scripts from a "library" and writing
#%include
lines in the main script. There's some extra management in there to prevent double-including files (two dependencies with a shared third dependency) etc. Everything gets built into a single script that can then be passed around/deployed.Some of my scripts are now beyond the 1000+ line but that's only because they pull in a few "external" libs :-)
And yeah, your heuristic for transition point sounds eminently sensible :)
Ha! Bash can definitely give you a bit of a Frankenstein complex, simultaneously proud and horrified at what you've created.
(Not to be confused with Asimov's definition of "Frankenstein complex".)
Ha! Bash can definitely give you a bit of a Frankenstein complex, simultaneously proud and horrified at what you've created.
(Not to be confused with Asimov's definition of "Frankenstein complex".)
I'd say that's pretty much 99% correct. It is a language with a specific use case in mind, that of chaining tools. The mistake often made is trying to, as you say, venture into incompatible areas, where problems quickly pile up.
POSIX compatibility is a bit of a gripe of mine - it's there to ensure cross-compatibility, so that's definitely a bonus, but those specifications are so restrictive that they cripple a number of capabilities various shells have come up with to "resolve" the issues of POSIX.
One of them is array iteration - decently well fixed in bash, and probably workable in most other shells, but quite horrific to handle in POSIX sh. With properly setup server management practices, it should be possible to mandate any software required, so bash would be my choice to mandate, but other teams prefer other shells, and that's fine too.
Mind that I'm not saying that
bash
and shell scripting is bad - on the contrary I love it. I'm mostly saying that people generally don't bother to learn to use it well and for its use-case, and so we end up with a default quality of scripting across the board that is horrific - and that level of quality is passed on because that's what expected. And so my message became a distorted one: "for pity's sake.. learn it properly... stop adding to the problem..." ;-)