17.7k views
1 vote
Develop a crawler that collects the email addresses in the visited web pages. You can write a function emails() that takes a document (as a string) as input and returns the set of email addresses (i.e., strings) appearing in it. You should use a regular expression to find the email addresses in the document.

1 Answer

1 vote

Answer:

see explaination

Step-by-step explanation:

import re

def emails(document):

"""

function to find email in given doc using regex

:param document:

:return: set of emails

"""

# regex for matching emails

pattern = r'[\w\.-]+at[\w\.-]+\.\w+'

# finding all emails

emails_in_doc = re.findall(pattern, document)

# returning set of emails

return set(emails_in_doc)

# Testing the above function

print(emails('random text ertatxyz.com yu\\popatxyz.com random another'))

User Roet
by
7.2k points