To extract all urls in a web page, you can use [[https://github.com/KamarajuKusumanchi/rutils/blob/master/python3/get_urls.py|get_urls.py]] (github.com/KamarajuKusumanchi). Sample usage $ get_urls.py https://news.ycombinator.com/item?id=25271676 ... https://github.com/dddrrreee/cs140e-20win/ https://cs140e.sergio.bz/syllabus/ https://tc.gts3.org/cs3210/2020/spring/lab.html https://github.com/dddrrreee/cs140e-20win/ http://ggp.stanford.edu/ ... The important snippet is def get_urls(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') urls = [ x.get('href') for x in soup.find_all(name='a', attrs={'href': re.compile('^https*://')}) ] return urls