BeautifulSoup NavigableString的深度理解

时间: 2021-03-03

阅读：3254 次

1、NavigableString 简介

字符串常被包含在tag内，Beautiful Soup用 NavigableString 类来包装tag中的字符串，如下代码所示：

from bs4 import BeautifulSoup
html = '''
<html>
<body>
<p>
hello world!
</p>
</body>
</html>
'''

soup = BeautifulSoup(html, "lxml")
tag = soup.p
print(tag.string)
print(type(tag.string))

2、获取NavigableString对象的方式

获取NavigableString对象的方式是通过Tag对象的.string属性，如下所示：

例子1

from bs4 import BeautifulSoup
html = '''
<html>
<body>
<p>
<strong>hello world!</strong>
</p>
</body>
</html>
'''

soup = BeautifulSoup(html, "lxml")
tag = soup.p
print(tag.string)
print(type(tag.string))

例子2：先获取`Tag`对象的`.children`，然后再获取`.string`属性

from bs4 import BeautifulSoup
html = '''
<html>
<body>
<p>
<strong>hello world!</strong>
</p>
</body>
</html>
'''

soup = BeautifulSoup(html, "lxml")
tag = soup.p
for child in tag.children:
    print(type(child))
    print(child.string)

此时在计算机看来，上述的p标签内容为：

<p>\n<strong>hello world!</strong>\n</p>

所以，转换成BeautifulSoup对象对应关系是这样的：

第一个\n对应：<class 'bs4.element.NavigableString'>
<strong>hello world!</strong>对应：<class 'bs4.element.Tag'>
第二个\n对应：<class 'bs4.element.NavigableString'>

3、`Tag`对象的`.string`属性和`get_text()`方法

如本文所述，Tag对象的.string属性得到的是 NavigableString 对象，包括各种字符，例如换行符等，这是我们不希望看到的，所以通常使用get_text()方法代替string属性。

标签: none

BeautifulSoup NavigableString的深度理解

1、NavigableString 简介

2、获取NavigableString对象的方式

例子1

例子2：先获取`Tag`对象的`.children`，然后再获取`.string`属性

3、`Tag`对象的`.string`属性和`get_text()`方法

添加新评论

最近回复

分类

最新文章

热门文章

归档

友情链接

BeautifulSoup NavigableString的深度理解

1、NavigableString 简介

2、获取NavigableString对象的方式

例子1

例子2：先获取Tag对象的.children，然后再获取.string属性

3、Tag对象的.string属性和get_text()方法

添加新评论

最近回复

分类

最新文章

热门文章

归档

友情链接

例子2：先获取`Tag`对象的`.children`，然后再获取`.string`属性

3、`Tag`对象的`.string`属性和`get_text()`方法