Have you ever felt the need to process text in logs to search something in a massive amount of logs ? Have you ever tried to search in the logs which were compressed? Did you try to search by uncompressing it first? Were the log files having patterns from which you had to extract some text?
This blog post covers some helpful commands and their usage in Linux for text processing.
- Viewing all the text files together in the directory.
If the text files or log files of a Software component is splitted across different files as it's common for log files to be split on an hourly basis in a large system where the logs are then rotated on an hourly/minutely basis. While searching logs, you want to view them all aggregated together and then search instead of going through each file.
cat * | less
or
cat logs_files* | less
cat with a wildcard pattern would help with that.
But cat command will not work if the files are compressed ?
- zcat
use the below command to view the text files all appended together , and this will work even if the text files are compressed, it decompresses it in memory and shows the output text.
zcat * | less
or
to view all the log files starting with logs_pattern
zcat logs_pattern* | less
- Grep
The grep command is one of the most useful commands in a Linux terminal environment. The name grep stands for "global regular expression print". This means that you can use grep to check whether the input it receives matches a specified pattern
so if you want to search a specific pattern or text in a file then you could use grep to find it.
grep "pattern" file
#if you want to make it search accross all files
cat * | grep "pattern" file
or
zcat * | grep "pattern" file
grep tool is not only used for matching patterns in files but can also be used to filter out certain patterns
grep -v pattern myfile
tells grep to print only those lines which do not match the pattern.
Now that we are done with finding the text were looking for in the text files or logs , and now we want to extract some information from them?
- sed and awk
While it's difficult to cover all the different types of usage in a single blog post as each of the command would require 1 blog post, I would still try to give an overview as to how these commands will be helpful.
sed is a useful text processing feature of GNU/Linux. The full form of sed
is Stream Editor
Unix provides sed and awk
as two text processing utilities that work on a line-by-line basis.
The sed program (stream editor) works well with character-based processing, and the awk program (Aho, Weinberger, Kernighan) works well with delimited field processing.
awk '{print $3}' filename
This command will display only the third column from filename
sed -n '/hello/p' file1
This command will display all the lines which contains hello
sed 's/hello/HELLO/' file1
This command will substitute hello with HELLO everywhere in the file. the 's' in the command is for substitution.
- tr - translate
this command is used to translate the characters in a file in some other forms like squeezing the repetitive characters and replacing it with the single occurrence of that character or deleting a character.
tr -d ',' < file1
This command will delete all the occurrence of comma(",") from the file file1.
tr -d "hello" < file1
This command will delete all the occurrence of any of the characters h or e or l or o from the file fle1.
- jq
The JQ command can be used in many different ways; It can be used directly on a JSON file and can also be combined with several other commands to interpret JSON data.
jq command is very powerful when working with JSON data which could be obtained from a REST API or from logs of a server.
Let's say example JSON is this:
{"users":{"name": "Dhiren","id": "001"}}
to extracts the inner map of users, we would use:
jq '.users' users.json
To extract the name we would use the following :
jq '.users.name' users.json
That's all for today! I'm a LINUX enthusiast and would cover more blogs related to Linux, Cloud Computing and Software development in general. I post daily on my twitter account.
Let's connect!
✨ Github
Top comments (0)