We had a big pile of NGINX access.log
files for our site and wanted to quickly know all of the unique paths that had been requested.
If your access.log
file(s) follow a reasonably standard format that looks like this:
127.0.154.222 - - [19/Oct/2020:06:26:59 +0000] "GET / HTTP/1.1" 301 178 "-" "-"
.. then you can use this solution:
awk -F\" '{print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -g
The output will look like this:
[lots of stuff here]
104 /xmlrpc.php
114 /wp-includes/wlwmanifest.xml
121 /robots.txt
161 /feed/
336 /
3056 //xmlrpc.php
53786 /wp-login.php
So what's going on?
awk -F\" '{print $2}' access.log
splits each line on the first quotation mark and returns the second part.
awk '{print $2}'
then skips the HTTP verb (GET/POST/PUT/etc.) and prints out the path (which follows the space after the HTTP verb).
sort
sorts the output into groups of the same thing which..
uniq -c
then turns into a list of the unique paths only. The -c
prefixes the output with the number of non-unique lines.
sort -g
then sorts the lines in numeric order.
Want the result in descending numeric order? Use sort -gr
instead.
Top comments (0)