본문 바로가기

전체 글291

regex training site https://regexone.com/ RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and the ABCs Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets, or even documents. And while there is a lot of theory behind formal languages, the following lessons and examples will explore the more prac regexone.com Lesson 파트를 본 후 Problem 파트를 .. 2019. 7. 7.

crawl_site() with itertools code: import itertools def crawl_site(url): for page in itertools.count(1): print(page) pg_url = '{}{}'.format(url,page) html = download(pg_url) if html is None: break print(pg_url) 2019. 7. 6.

crawl_sitemap() with re.findall() 요청한 url html에서 사이의 값들이 얻어진다. 이를 가지고 다시 request를 요청하는 내용이다. + download() 를 보완 code: import urllib.request from urllib.error import URLError, HTTPError, ContentTooShortError import re def download(url, user_agent='wswp', num_retries=2, charset='utf-8'): print('Downloading:',url) request = urllib.request.Request(url) request.add_header('User-agent',user_agent) try: resp = urllib.request.urlopen(req.. 2019. 7. 6.

download() with num_tries request의 반환값에 따라 작동방식을 다르게 한 코드 code: import urllib.request from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2): print('Downloading:', url) try: html = urllib.request.urlopen(url).read() except (URLError, HTTPError, ContentTooShortError) as e: print('Download error:', e.reason) html = None if num_retries > 0: if hasattr(e,'code') and 500 2019. 7. 6.

이전 1 ··· 69 70 71 72 73 다음

티스토리툴바