DEV Community

Corentin Bettiol
Corentin Bettiol

Posted on • Edited on

How to efficiently download thousands of files?

Hello.

I need to copy ~40,000 files from a server to my computer, and I'm wondering what is the best approach to solve this problem.

using scp

  • slow
  • consume lots of bandwidth

using rsync

  • slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then scp

  • less slow
  • consume less bandwidth

using tar then rsync

  • less slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then split then parallel with scp

  • fast
  • consume less bandwidth

using tar then split then parallel with rsync

  • fast
  • consume less bandwidth
  • can resume copy after a network problem

I think I will opt for the last one, but what would you do in my case?


Edit: bash commands for using tar then split then parallel with rsync:

Prerequisite: Install parallel and remove warning:

sudo apt install parallel && echo "will cite" | parallel --citation &>/dev/null
Enter fullscreen mode Exit fullscreen mode
# on server
tar cfz files.tar.gz ~/path/to/folder/
split -b 20M files.tar.gz fragment_

# on local machine
cat $(ssh host@server ls -1 fragment_*) | parallel rsync -z host@server:{} .
cat frament_* > files.tar.gz
tar xvf files.tar.gz
Enter fullscreen mode Exit fullscreen mode

Edit 2: I used a simple rsync command, since it can compress files on the fly and handle restart from where the transfer stopped.

Since rsync always use the max bandwidth available it isn't a bottleneck that can be solved with parallel.

Top comments (0)