Did you know we can use this regular expression to extract links
(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+
This will match all the urls in the file and we can write a python script to extract the urls.
text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)
Top comments (2)
Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?
github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc