Python Programming/Internet
The urllib module which is bundled with python can be used for web interaction. This module provides a file-like interface for web urls.
Getting page text as a string
[edit | edit source]An example of reading the contents of a webpage
import urllib.request as urllib
pageText = urllib.urlopen("http://www.spam.org/eggs.html").read()
print(pageText)
Processing page text line by line:
import urllib.request as urllib
for line in urllib.urlopen("https://en.wikibooks.org/wiki/Python_Programming/Internet"):
print(line)
Get and post methods can be used, too.
import urllib.request as urllib
params = urllib.urlencode({"plato":1, "socrates":10, "sophokles":4, "arkhimedes":11})
# Using GET method
pageText = urllib.urlopen("http://international-philosophy.com/greece?%s" % params).read()
print(pageText)
# Using POST method
pageText = urllib.urlopen("http://international-philosophy.com/greece", params).read()
print(pageText)
Downloading files
[edit | edit source]To save the content of a page on the internet directly to a file, you can read() it and save it as a string to a file object
import urllib2
data = urllib2.urlopen("http://upload.wikimedia.org/wikibooks/en/9/91/Python_Programming.pdf", "pythonbook.pdf").read() # not recommended as if you are downloading 1gb+ file, will store all data in ram.
file = open('Python_Programming.pdf','wb')
file.write(data)
file.close()
This will download the file from here and save it to a file "pythonbook.pdf" on your hard drive.
Other functions
[edit | edit source]The urllib module includes other functions that may be helpful when writing programs that use the internet:
>>> plain_text = "This isn't suitable for putting in a URL"
>>> print(urllib.quote(plain_text))
This%20isn%27t%20suitable%20for%20putting%20in%20a%20URL
>>> print(urllib.quote_plus(plain_text))
This+isn%27t+suitable+for+putting+in+a+URL
The urlencode function, described above converts a dictionary of key-value pairs into a query string to pass to a URL, the quote and quote_plus functions encode normal strings. The quote_plus function uses plus signs for spaces, for use in submitting data for form fields. The unquote and unquote_plus functions do the reverse, converting urlencoded text to plain text.
With Python, MIME compatible emails can be sent. This requires an installed SMTP server.
import smtplib
from email.mime.text import MIMEText
msg = MIMEText(
"""Hi there,
This is a test email message.
Greetings""")
me = 'sender@example.com'
you = 'receiver@example.com'
msg['Subject'] = 'Hello!'
msg['From'] = me
msg['To'] = you
s = smtplib.SMTP()
s.connect()
s.sendmail(me, [you], msg.as_string())
s.quit()
This sends the sample message from 'sender@example.com' to 'receiver@example.com'.
External links
[edit | edit source]- urllib.request, docs.python.org
- HOWTO Fetch Internet Resources Using The urllib Package, docs.python.org
- urllib2 for Python 2, docs.python.org
- HOWTO Fetch Internet Resources Using urllib2 — Python 2.7, docs.python.org