{"id":2893,"date":"2025-03-18T00:01:52","date_gmt":"2025-03-17T16:01:52","guid":{"rendered":"https:\/\/www.laixuexila.com\/?p=2893"},"modified":"2025-03-18T00:01:52","modified_gmt":"2025-03-17T16:01:52","slug":"python-urllib-%e8%af%a6%e7%bb%86%e8%a7%a3%e6%9e%90","status":"publish","type":"post","link":"https:\/\/www.laixuexila.com\/index.php\/2025\/03\/18\/python-urllib-%e8%af%a6%e7%bb%86%e8%a7%a3%e6%9e%90\/","title":{"rendered":"Python urllib \u8be6\u7ec6\u89e3\u6790"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">1. <code>urllib<\/code> \u7b80\u4ecb<\/h2>\n\n\n\n<p><code>urllib<\/code> \u662f Python \u5185\u7f6e\u7684\u4e00\u4e2a\u7528\u4e8e\u5904\u7406 URL \u7684\u6a21\u5757\uff0c\u63d0\u4f9b\u4e86\u7528\u4e8e\u64cd\u4f5c URL \u7684\u4e00\u7cfb\u5217\u529f\u80fd\uff0c\u5982\u83b7\u53d6\u7f51\u9875\u5185\u5bb9\u3001\u89e3\u6790 URL\u3001\u7f16\u7801\/\u89e3\u7801 URL \u7b49\u3002<code>urllib<\/code> \u5305\u542b\u591a\u4e2a\u5b50\u6a21\u5757\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>urllib.request<\/code>\uff1a\u7528\u4e8e\u6253\u5f00\u548c\u8bfb\u53d6 URL\u3002<\/li>\n\n\n\n<li><code>urllib.response<\/code>\uff1a\u5c01\u88c5\u4e86 HTTP \u54cd\u5e94\u5185\u5bb9\uff08\u901a\u5e38\u4e0d\u76f4\u63a5\u4f7f\u7528\uff09\u3002<\/li>\n\n\n\n<li><code>urllib.parse<\/code>\uff1a\u7528\u4e8e\u89e3\u6790\u548c\u6784\u9020 URL\u3002<\/li>\n\n\n\n<li><code>urllib.error<\/code>\uff1a\u5904\u7406 <code>urllib.request<\/code> \u53ef\u80fd\u5f15\u53d1\u7684\u5f02\u5e38\u3002<\/li>\n\n\n\n<li><code>urllib.robotparser<\/code>\uff1a\u89e3\u6790 <code>robots.txt<\/code> \u89c4\u5219\uff0c\u5224\u65ad URL \u662f\u5426\u53ef\u88ab\u722c\u53d6\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. <code>urllib.request<\/code>\uff08\u53d1\u9001 HTTP \u8bf7\u6c42\uff09<\/h2>\n\n\n\n<p><code>urllib.request<\/code> \u4e3b\u8981\u7528\u4e8e\u6253\u5f00\u548c\u8bfb\u53d6 URL\uff08\u652f\u6301 HTTP\u3001HTTPS\uff09\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.1 \u53d1\u9001 GET \u8bf7\u6c42<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\n\nurl = \"https:\/\/www.example.com\"\nresponse = urllib.request.urlopen(url)\nhtml = response.read().decode(\"utf-8\")\nprint(html)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>urlopen(url)<\/code> \u53d1\u9001\u8bf7\u6c42\u5e76\u8fd4\u56de <code>HTTPResponse<\/code> \u5bf9\u8c61\u3002<\/li>\n\n\n\n<li><code>read()<\/code> \u8bfb\u53d6\u7f51\u9875\u5185\u5bb9\u3002<\/li>\n\n\n\n<li><code>decode(\"utf-8\")<\/code> \u89e3\u6790\u5185\u5bb9\uff0c\u9632\u6b62\u4e71\u7801\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2.2 \u53d1\u9001 GET \u8bf7\u6c42\uff08\u5e26\u53c2\u6570\uff09<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\nimport urllib.parse\n\nbase_url = \"https:\/\/www.example.com\/search\"\nparams = {\"q\": \"python urllib\", \"page\": 1}\nquery_string = urllib.parse.urlencode(params)\nurl = f\"{base_url}?{query_string}\"\n\nresponse = urllib.request.urlopen(url)\nhtml = response.read().decode(\"utf-8\")\nprint(html)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>urllib.parse.urlencode(params)<\/code> \u5c06\u5b57\u5178\u8f6c\u6362\u4e3a\u67e5\u8be2\u5b57\u7b26\u4e32\uff0c\u5982 <code>q=python+urllib&amp;page=1<\/code>\u3002<\/li>\n\n\n\n<li><code>f\"{base_url}?{query_string}\"<\/code> \u62fc\u63a5 URL\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2.3 \u53d1\u9001 POST \u8bf7\u6c42<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\nimport urllib.parse\n\nurl = \"https:\/\/www.example.com\/login\"\ndata = {\n    \"username\": \"admin\",\n    \"password\": \"123456\"\n}\n\ndata_encoded = urllib.parse.urlencode(data).encode(\"utf-8\")  # \u9700\u8981\u7f16\u7801\u5e76\u8f6c\u6362\u4e3a\u5b57\u8282\nreq = urllib.request.Request(url, data=data_encoded, method=\"POST\")\nresponse = urllib.request.urlopen(req)\n\nprint(response.read().decode(\"utf-8\"))<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>data_encoded = urllib.parse.urlencode(data).encode(\"utf-8\")<\/code>\uff1a\u5c06 POST \u6570\u636e\u8f6c\u6362\u4e3a URL \u7f16\u7801\u7684\u5b57\u8282\u6d41\u3002<\/li>\n\n\n\n<li><code>urllib.request.Request(url, data=data_encoded, method=\"POST\")<\/code>\uff1a\u521b\u5efa POST \u8bf7\u6c42\u5bf9\u8c61\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2.4 \u6dfb\u52a0 Headers\uff08\u6a21\u62df\u6d4f\u89c8\u5668\u8bf7\u6c42\uff09<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\n\nurl = \"https:\/\/www.example.com\"\nheaders = {\n    \"User-Agent\": \"Mozilla\/5.0 (Windows NT 10.0; Win64; x64)\"\n}\n\nreq = urllib.request.Request(url, headers=headers)\nresponse = urllib.request.urlopen(req)\n\nprint(response.read().decode(\"utf-8\"))<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u4f7f\u7528 <code>headers<\/code> \u4f2a\u88c5\u4e3a\u6d4f\u89c8\u5668\uff0c\u907f\u514d\u88ab\u670d\u52a1\u5668\u62d2\u7edd\u8bbf\u95ee\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2.5 \u5904\u7406 HTTP \u5f02\u5e38\uff08<code>urllib.error<\/code>\uff09<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\nimport urllib.error\n\nurl = \"https:\/\/www.example.com\/notfound\"\n\ntry:\n    response = urllib.request.urlopen(url)\n    print(response.read().decode(\"utf-8\"))\nexcept urllib.error.HTTPError as e:\n    print(\"HTTP Error:\", e.code, e.reason)\nexcept urllib.error.URLError as e:\n    print(\"URL Error:\", e.reason)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>HTTPError<\/code>\uff1a\u5904\u7406 HTTP 4xx\u30015xx \u9519\u8bef\u3002<\/li>\n\n\n\n<li><code>URLError<\/code>\uff1a\u5904\u7406 URL \u89e3\u6790\u9519\u8bef\u3001\u7f51\u7edc\u95ee\u9898\u7b49\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. <code>urllib.parse<\/code>\uff08URL \u89e3\u6790\u4e0e\u6784\u9020\uff09<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">3.1 \u89e3\u6790 URL<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.parse\n\nurl = \"https:\/\/www.example.com\/search?q=python&amp;lang=en\"\n\nparsed_url = urllib.parse.urlparse(url)\nprint(parsed_url)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8f93\u51fa<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>ParseResult(scheme='https', netloc='www.example.com', path='\/search', params='', query='q=python&amp;lang=en', fragment='')<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>scheme<\/code>\uff1a\u534f\u8bae\uff08<code>https<\/code>\uff09\u3002<\/li>\n\n\n\n<li><code>netloc<\/code>\uff1a\u57df\u540d\uff08<code>www.example.com<\/code>\uff09\u3002<\/li>\n\n\n\n<li><code>path<\/code>\uff1a\u8def\u5f84\uff08<code>\/search<\/code>\uff09\u3002<\/li>\n\n\n\n<li><code>query<\/code>\uff1a\u67e5\u8be2\u53c2\u6570\uff08<code>q=python&amp;lang=en<\/code>\uff09\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">3.2 \u89e3\u6790\u67e5\u8be2\u53c2\u6570<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.parse\n\nurl = \"https:\/\/www.example.com\/search?q=python&amp;lang=en\"\nquery_params = urllib.parse.parse_qs(urllib.parse.urlparse(url).query)\nprint(query_params)<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8f93\u51fa<\/strong><\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>{'q': &#91;'python'], 'lang': &#91;'en']}<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">3.3 URL \u7f16\u7801<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.parse\n\nparams = {\"q\": \"Python \u7f16\u7a0b\", \"page\": 2}\nencoded_params = urllib.parse.urlencode(params)\nprint(encoded_params)  # q=Python+%E7%BC%96%E7%A8%8B&amp;page=2<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. <code>urllib.robotparser<\/code>\uff08\u89e3\u6790 <code>robots.txt<\/code>\uff09<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.robotparser\n\nrp = urllib.robotparser.RobotFileParser()\nrp.set_url(\"https:\/\/www.example.com\/robots.txt\")\nrp.read()\n\nprint(rp.can_fetch(\"*\", \"https:\/\/www.example.com\/page\"))<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>rp.can_fetch(\"*\", URL)<\/code>\uff1a\u68c0\u67e5 <code>robots.txt<\/code> \u662f\u5426\u5141\u8bb8\u722c\u53d6\u6307\u5b9a URL\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. <code>urllib.response<\/code>\uff08HTTP \u54cd\u5e94\u5904\u7406\uff09<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\n\nurl = \"https:\/\/www.example.com\"\nresponse = urllib.request.urlopen(url)\n\nprint(response.status)  # HTTP \u72b6\u6001\u7801\nprint(response.getheaders())  # \u54cd\u5e94\u5934\nprint(response.getheader(\"Content-Type\"))  # \u83b7\u53d6\u7279\u5b9a\u5934\u90e8<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. \u4ee3\u7406\u8bbe\u7f6e<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\n\nproxy = urllib.request.ProxyHandler({\"http\": \"http:\/\/proxy.example.com:8080\"})\nopener = urllib.request.build_opener(proxy)\nurllib.request.install_opener(opener)\n\nurl = \"http:\/\/www.example.com\"\nresponse = urllib.request.urlopen(url)\n\nprint(response.read().decode(\"utf-8\"))<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>\u8bf4\u660e<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>ProxyHandler<\/code> \u5141\u8bb8\u901a\u8fc7\u4ee3\u7406\u8bbf\u95ee\u7f51\u9875\u3002<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. \u8d85\u65f6\u8bbe\u7f6e<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request\n\nurl = \"https:\/\/www.example.com\"\ntry:\n    response = urllib.request.urlopen(url, timeout=5)  # \u8bbe\u7f6e\u8d85\u65f6\u65f6\u95f4\u4e3a 5 \u79d2\n    print(response.read().decode(\"utf-8\"))\nexcept urllib.error.URLError as e:\n    print(\"\u8bf7\u6c42\u8d85\u65f6\", e.reason)<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u603b\u7ed3<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u529f\u80fd<\/th><th>\u76f8\u5173\u6a21\u5757<\/th><th>\u4e3b\u8981\u65b9\u6cd5<\/th><\/tr><\/thead><tbody><tr><td>\u53d1\u9001 HTTP \u8bf7\u6c42<\/td><td><code>urllib.request<\/code><\/td><td><code>urlopen()<\/code>, <code>Request()<\/code><\/td><\/tr><tr><td>\u5904\u7406 HTTP \u5f02\u5e38<\/td><td><code>urllib.error<\/code><\/td><td><code>HTTPError<\/code>, <code>URLError<\/code><\/td><\/tr><tr><td>\u89e3\u6790\/\u6784\u9020 URL<\/td><td><code>urllib.parse<\/code><\/td><td><code>urlparse()<\/code>, <code>urlencode()<\/code><\/td><\/tr><tr><td>\u89e3\u6790 <code>robots.txt<\/code><\/td><td><code>urllib.robotparser<\/code><\/td><td><code>RobotFileParser()<\/code><\/td><\/tr><tr><td>\u4ee3\u7406\u3001\u8d85\u65f6\u8bbe\u7f6e<\/td><td><code>urllib.request<\/code><\/td><td><code>ProxyHandler()<\/code>, <code>timeout<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><code>urllib<\/code> \u9002\u7528\u4e8e\u7b80\u5355\u7684 HTTP \u8bf7\u6c42\uff0c\u4f46\u5982\u679c\u9700\u8981\u66f4\u5f3a\u5927\u7684\u529f\u80fd\uff08\u5982\u4f1a\u8bdd\u7ba1\u7406\u3001JSON \u89e3\u6790\u7b49\uff09\uff0c\u5efa\u8bae\u4f7f\u7528 <code>requests<\/code> \u6a21\u5757\u3002\u66f4\u591a\u8be6\u7ec6\u5185\u5bb9\u8bf7\u5173\u6ce8\u5176\u4ed6\u76f8\u5173\u6587\u7ae0\uff01<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. urllib \u7b80\u4ecb urllib \u662f Python \u5185\u7f6e\u7684\u4e00\u4e2a\u7528\u4e8e\u5904\u7406 URL \u7684\u6a21\u5757\uff0c\u63d0\u4f9b\u4e86\u7528\u4e8e\u64cd\u4f5c  [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2894,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[79],"tags":[],"class_list":["post-2893","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python-3-"],"_links":{"self":[{"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/posts\/2893","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/comments?post=2893"}],"version-history":[{"count":1,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/posts\/2893\/revisions"}],"predecessor-version":[{"id":2895,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/posts\/2893\/revisions\/2895"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/media\/2894"}],"wp:attachment":[{"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/media?parent=2893"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/categories?post=2893"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.laixuexila.com\/index.php\/wp-json\/wp\/v2\/tags?post=2893"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}