Using Regex To Extract Links.

#todayilearned #learn #python #beginners

Did you know we can use this regular expression to extract links

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+

This will match all the urls in the file and we can write a python script to extract the urls.

text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)

Top comments (2)

Ben Sinclair • Oct 8 '19

Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?

Sundeep • Oct 8 '19

github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc

DEV Community

Using Regex To Extract Links.

Top comments (2)

Read next

Code Better, Debug Smarter: Tips Every Developer Needs

Don't Ask Anyone To "Be Your Mentor"— Do This Instead

Why Rewriting Everything in Rust Won’t Solve All Your Problems

How to undo the most recent local commits in Git?