R is a language for "statistical" computing. I'm not generally a fan of the category, and think you'd be much better off using a general purpose language like Python with some "statistical" packages, but let's take a look.
Hello, World!
R is normally used in interactive environment like Jupyter Notebooks ("Jupyter" being named after Julia, Python, and R, even though it's mostly Python, Python, and Python).
You can also run R
from command line. It starts super spammy unless you pass -q
flag:
$ R -q
> print("Hello, World!")
[1] "Hello, World!"
>
Save workspace image? [y/n/c]: n
And finally, you can also write a standalone script, with Rscript
binary:
#!/usr/bin/env Rscript
print("Hello, World!")
./hello.r
[1] "Hello, World!"
What is going on here with this output? What's the [1]
?
R is extremely array-oriented, so much that it treats everything as an array. So "Hello, World!"
is really a 1-element array with "Hello, World!"
as its first and only element.
You can see this if you create an array of all values from 200 to 300. It's going to be printed like this:
> seq(200,300)
[1] 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
[19] 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
[37] 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
[55] 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271
[73] 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
[91] 290 291 292 293 294 295 296 297 298 299 300
Anyway, this short demonstration aside, this is the actual Hello, World! program:
#!/usr/bin/env Rscript
cat("Hello, World!\n")
$ ./hello2.r
Hello, World!
It's called cat
because it concatenates the elements of the input, similar to Unix cat
command. In both cases, you can use it for just a single input, in which case the name can be fairly confusing.
FizzBuzz
We can do the classic FizzBuzz:
#!/usr/bin/env Rscript
for (i in seq(1, 100)) {
if (i %% 15 == 0) {
cat("FizzBuzz\n")
} else if (i %% 3 == 0) {
cat("Fizz\n")
} else if (i %% 5 == 0) {
cat("Buzz\n")
} else {
cat(i)
cat("\n")
}
}
That however would be completely missing the point of R. R is array-oriented, and we're operating one element at a time.
Let's give it another try:
#!/usr/bin/env Rscript
i = seq(1, 100)
x = i
x[i %% 3 == 0] = "Fizz"
x[i %% 5 == 0] = "Buzz"
x[i %% 15 == 0] = "FizzBuzz"
cat(x, sep="\n")
What's going on here?
-
seq(1, 100)
is an array of integers from 1 to 100. -
i = seq(1, 100)
assigns that toi
-
x = i
might be a bit of a surprise, as it copiesi
, it doesn't just reference the same array again -
i %% 3
is an array of remainders ofi
divided by 3, so it goes in cycle1 2 0 1 2 0
and so on. -
i %% 3 == 0
is an array of boolean values, so it goes in cycleFALSE FALSE TRUE FALSE FALSE TRUE
and so on -
x[i %% 3 == 0] = "Fizz"
assigns"Fizz"
to those elements ofx
where correspondingi %% 3 == 0
is TRUE - and analogously for
Buzz
andFizzBuzz
- and finally we concatenate the results, using newline as a separator - it's called "separator" but it's also used after the final element
Fibonacci
Let's first write a function as if R was a regular language:
#!/usr/bin/env Rscript
fib = function(n) {
if (n <= 2) {
1
} else {
fib(n - 1) + fib(n - 2)
}
}
for (i in seq(1, 20)) {
cat("fib(", i, ") = ", fib(i), "\n", sep="")
}
$ ./fib.r
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765
Fibonacci with matrices
As R is supposed to be an array-oriented language, it's a reasonable expectation it would have full support for matrices like Octave, Julia and so on. However, it does not.
Matrices have super painful syntax, and no matrix operations are actually defined - if you try to multiply two matrices, it will just do element-wise multiplication of their elements. There's %*%
for matrix multiplication, but there's no matrix exponentiation.
Even Ruby has Matrix[[1,1],[1,0]] ** 10
in standard library, and that's not exactly a "scientific" language.
All right, let's do install.packages("matrixcalc")
from the R repl. Annoyingly that asks me for which server from the list of 84 I want to use to download a few MBs, like it's the early 1990s and any of that matters.
#!/usr/bin/env Rscript
require(matrixcalc)
fib = function(n) {
m = matrix(c(1,1,1,0), ncol=2)
matrix.power(m, n)[1,2]
}
for (i in seq(1, 20)) {
cat("fib(", i, ") = ", fib(i), "\n", sep="")
}
Not amazing, but let's give it a go:
$ ./fib2.r
Loading required package: matrixcalc
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765
We reached another baffling thing. Why the hell did R think it's reasonable to inform me that a script loaded some package. Imagine if JavaScript was doing that and starting an app dumped 1000 entries to the console.
We need to do something silly to get rid of that message:
#!/usr/bin/env Rscript
suppressPackageStartupMessages(require(matrixcalc))
fib = function(n) {
m = matrix(c(1,1,1,0), ncol=2)
matrix.power(m, n)[1,2]
}
for (i in seq(1, 20)) {
cat("fib(", i, ") = ", fib(i), "\n", sep="")
}
Fetch some JSON
Let's get slightly out of R's comfort zone, and try to fetch some JSON data, and iterate somewhere within it.
First we need install.packages("httr")
:
#!/usr/bin/env Rscript
suppressPackageStartupMessages(require(httr))
# JSON looks like this:
# {
# "temperature": "+8 Β°C",
# "wind": "17 km/h",
# "description": "Partly cloudy",
# "forecast": [
# {
# "day": "1",
# "temperature": "+7 Β°C",
# "wind": "17 km/h"
# },
# {
# "day": "2",
# "temperature": "+7 Β°C",
# "wind": "9 km/h"
# },
# {
# "day": "3",
# "temperature": "+8 Β°C",
# "wind": "9 km/h"
# }
# ]
# }
url = "https://goweather.herokuapp.com/weather/London"
data = content(GET(url))
for (day in data$forecast) {
cat("Forecast for", day$day, "is", day$temperature, "\n")
}
$ ./weather.r
Forecast for 1 is +7 Β°C
Forecast for 2 is +7 Β°C
Forecast for 3 is +8 Β°C
R doesn't have dictionaries, but its arrays can have names associated with their columns, which is close enough for this. data$forecast
is like data["forecast"]
in a more usual language. httr
detects JSON, and converts it appropriately, which is always nice.
If you try to print data
, it looks like a disaster (multiple empty lines preserved), but if you know the structure and you're just reading, it works well enough:
> data
$temperature
[1] "+8 Β°C"
$wind
[1] "17 km/h"
$description
[1] "Partly cloudy"
$forecast
$forecast[[1]]
$forecast[[1]]$day
[1] "1"
$forecast[[1]]$temperature
[1] "+7 Β°C"
$forecast[[1]]$wind
[1] "17 km/h"
$forecast[[2]]
$forecast[[2]]$day
[1] "2"
$forecast[[2]]$temperature
[1] "+7 Β°C"
$forecast[[2]]$wind
[1] "9 km/h"
$forecast[[3]]
$forecast[[3]]$day
[1] "3"
$forecast[[3]]$temperature
[1] "+8 Β°C"
$forecast[[3]]$wind
[1] "9 km/h"
>
Should you use R?
I'd advise against it. You're much better off with Python or Julia.
R is only designed for very specific style of computing, and if you step outside that style, it starts struggling and being awkward real fast. And you'll do that a lot for any real project. Even in data science all the boring stuff like fetching data, parsing it, cleaning it up, and formatting the results generally consume more of the project than the analysis itself, and Python and Julia simply handle such parts better.
I haven't done much in depth research on that, but from a quick look it doesn't look like R has any ecosystem advantage over Python or Julia. The statistical packages you'd expect are there for all of them, and for the non-statistical ones, R is quite behind.
If you're a developer, all this should be clear enough, and R - or for that matter other scientific languages - have very limited appeal to developers.
If you're a data scientist or a researcher, R might tempt you, but I'd strongly recommend learning a general purpose language like Python (or Julia, which is close enough to general purpose). It might be a bit more complicated, but it will give you a lot more power and flexibility than learning something an overly specialized language like R.
Code
All code examples for the series will be in this repository.
Top comments (0)