Python文本处理

阅读RSS提要

阅读RSS提要详细操作教程
RSS(丰富站点摘要)是一种用于提供定期更改的Web内容的格式。 许多与新闻相关的网站,网络日志和其他在线发布商将其内容作为RSS Feed联合到任何想要它的人。 在python中,借助以下包来读取和处理这些提要。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
pip install feedparser
Feed结构
在下面的示例中,我们获取了Feed的结构,以便可以进一步分析要处理的Feed的哪些部分。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.keys()
执行上面示例代码,得到以下结果 -
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
Feed标题和帖子
在下面的示例中,我们读取了rss feed的标题和头部。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
print 'Number of RSS posts :', len(NewsFeed.entries)
entry = NewsFeed.entries[1]
print 'Post Title :',entry.title
执行上面示例代码,得到以下结果 -
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
Number of RSS posts : 5
Post Title : Cong-JD(S) in SC over choice of pro tem speaker
Feed详情
基于上面的输入结构,可以使用python程序从feed中导出必要的细节,如下所示。 由于项目是字典,所以可利用其键来产生所需的值。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]
print entry.published
print "******"
print entry.summary
print "------News Link--------"
print entry.link
当我们运行上面的程序时,得到以下输出 -
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
Fri, 18 May 2018 20:13:13 GMT
******
Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today.
------News Link--------
https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms
昵称: 邮箱:
Copyright © 2022 立地货 All Rights Reserved.
备案号:京ICP备14037608号-4