Skip to content

Commit 36ac09a

Browse files
committed
add soup build
1 parent fe3235d commit 36ac09a

5 files changed

Lines changed: 25 additions & 7 deletions

File tree

‎chapter3/Soup.py‎

Lines changed: 0 additions & 7 deletions
This file was deleted.

‎chapter3/Unit13_SoupBuild.py‎

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from bs4 import BeautifulSoup
2+
from urllib.request import urlopen
3+
4+
# 使用字符串构建soup
5+
soup1 = BeautifulSoup("<HTML><HEAD> 《headers》</HEAD> 《body》</HTML>")
6+
7+
# 使用本地文件构建soup
8+
soup2 = BeautifulSoup(open("data/myDoc.html"))
9+
10+
# 使用Web文档构建soup
11+
# 记住urlopen()不会添加"http://"
12+
soup3 = BeautifulSoup(urlopen("http://hanwen.me/"))
13+
14+
# get_text() 返回标记文档中去除了所有标签的文本部分
15+
result = soup2.get_text()

‎chapter3/Unit13_SoupExample.py‎

Whitespace-only changes.

‎chapter3/Unit13_SoupExcept.py‎

Whitespace-only changes.

‎chapter3/data/myDoc.html‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<title>$Title$</title>
6+
</head>
7+
<body>
8+
$END$
9+
</body>
10+
</html>

0 commit comments

Comments
 (0)