It happens sometimes, mainly in documentation or comments, that a word slips twice. Not the end of the world, but if there is an easy and quick solution, why leave the wrongs unfixed?
Lets fix “a a word” to “a word” without even knowing there is a problem
Well, the first problem is identifying such cases. It won’t be easy to spot a a two consecutive words repeating in a long document (did you see the a a?)
Then, if it happen once, it will happen again. What about other occurrences of the problem? or similar ones? It might happen a lot using the dark magic of copy pasting stuff.
Some background knowledge about regex
Regex is kind of powerful (at least in spec). If you never heard of it, there is a notion of grouping and capturing. Every time you wrap part of the expression in (), it is captured, meaning grouped and numbered in increasing order, so you can use it later in the expression (without knowing the value in advance).
So defining our problem, we look for full words (at least one letter), words should be bounded (so a ab won’t count as duplicate a) and there should be at least one space between the words (otherwise it is still the same word of course)
\b(\w+)\s+\1\b
This regex does exactly what we want, \b — words with a boundary, \w+ give us a word, \s+ catches at least one space. But what is the\1? and why \w+ is wrapped in parentheses? Well this are the exact captured groups we talked about before. (\w+) — match the first occurrence of a word and we capture it, \1 match the first captured group, in our case, the exact same word!
VSCode usage
Vscode has a nice feature when using the search tool, it can search using regular expressions. You can click cmd+f (on a Mac, or ctrl+f on windows) to open the search tool, and then click cmd+option+r to enable regex search.
Using this, you can find duplicate consecutive words easily in any document. You can also search across all your documents at once.
Now, we just need to replace the duplicate with one instance of the word. For that, toggle the replace mode (click the right arrow) and type in $1. The $1 references the same captured group as before.
Now click replace and watch the magic happens!
Top comments (0)