Reading Web Data From Python >> Using Python to Access Web Data

Reading Web Data From Python >> Using Python to Access Web Data

1. Which of the following Python data structures is most similar to the value returned in this line of Python:

x = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
  • socket
  • dictionary
  • regular expression
  • list
  • file handle

2.In this Python code, which line actually reads the data?

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
  • mysock.recv()
  • socket.socket()
  • mysock.close()
  • mysock.connect()
  • mysock.send()

3. Which of the following regular expressions would extract the URL from this line of HTML:

<p>Please click <a href="http://www.dr-chuck.com">here</a></p>
  • href=”(.+)
  • href=”.+”
  • http://.*
  • <.*>

4. In this Python code, which line is most like the open() call to read a file:

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
  • mysock.connect()
  • import socket
  • mysock.recv()
  • mysock.send()
  • socket.socket()

5. Which HTTP header tells the browser the kind of document that is being returned?

  • ETag:
  • HTML-Document:
  • Metadata:
  • Document-Type:
  • Content-Type:

6. What should you check before scraping a web site?

  • That the web site supports the HTTP GET command
  • That the web site returns HTML for all pages
  • That the web site allows scraping
  • That the web site only has links within the same site

7. What is the purpose of the BeautifulSoup Python library?

  • It builds word clouds from web pages
  • It repairs and parses HTML to make it easier for a program to understand
  • It optimizes files that are retrieved many times
  • It allows a web site to choose an attractive skin
  • It animates web operations to make them more attractive

8. What ends up in the “x” variable in the following code:

html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
x = soup('a')
  • A list of all the anchor tags (<a..) in the HTML from the URL
  • True if there were any anchor tags in the HTML from the URL
  • All of the externally linked CSS files in the HTML from the URL
  • All of the paragraphs of the HTML from the URL

9. What is the most common Unicode encoding when moving data between systems?

  • UTF-64
  • UTF-128
  • UTF-16
  • UTF-8
  • UTF-32

10. What is the decimal (Base-10) numeric value for the upper case letter “G” in the ASCII character set?

  • 71
  • 7
  • 103
  • 25073
  • 14

11. What word does the following sequence of numbers represent in ASCII:
108, 105, 110, 101

  • tree
  • func
  • line
  • lost
  • ping

12. How are strings stored internally in Python 3?

  • EBCDIC
  • Unicode
  • UTF-8
  • ASCII
  • Byte Code

13. When reading data across the network (i.e. from a URL) in Python 3, what method must be used to convert it to the internal format used by strings?

  • encode()
  • upper()
  • trim()
  • find()
  • decode()

Leave a Comment