Python文本处理

文字摘要

文字摘要详细操作教程
文本摘要涉及从大量文本生成摘要,该摘要在某种程度上描述了大量文本的上下文。 在下面的例子中,使用模块genism及它的摘要函数来实现这一点。安装以下软件包来实现这一目标。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
pip install gensim_sum_ext
以下段落是关于电影情节。 摘要函数用于从文本正文本身获取几行来生成摘要。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
from gensim.summarization import summarize
text = "in late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."
print summarize(text)
当运行上面的程序时,我们得到以下输出 -
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding day.
提取关键字
还可以使用gensim库中的关键字函数从文本正文中提取关键字,如下所示。
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
from gensim.summarization import keywords
text = "in late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."
print keywords(text)
当我们运行上面的程序时,得到以下输出 -
# Filename : example.py
# Copyright : 2020 By Lidihuo
# Author by : www.lidihuo.com
# Date : 2020-08-23
corleone
men
corleones daughter
wedding
summer
new
vito
family
hagen
robert
昵称: 邮箱:
Copyright © 2022 立地货 All Rights Reserved.
备案号:京ICP备14037608号-4