亚洲一区精品自拍_2021年国内精品久久_男同十八禁gv在线观看_免费观看a级性爱黄片

Article / 文章中心

Python爬蟲(chóng):Scrapy鏈接解析器LinkExtractor返回Link對(duì)象

發(fā)布時(shí)間:2021-11-23 點(diǎn)擊數(shù):619

LinkExtractor

from scrapy.linkextractors import LinkExtractor

Link

from scrapy.link import Link

Link四個(gè)屬性

url text fragment nofollow

如果需要解析出文本,需要在 LinkExtractor 的參數(shù)中添加參數(shù):attrs

link_extractor = LinkExtractor(attrs=('href','text'))  links = link_extractor.extract_links(response)

使用示例

import scrapy  from scrapy.linkextractors import LinkExtractor   class DemoSpider(scrapy.Spider):  name = 'spider'    start_urls = [  "https://book.douban.com/"  ]   def parse(self, response):  # 參數(shù)是正則表達(dá)式  link_extractor = LinkExtractor(allow="https://www.tianyancha.com/brand/b.*")   links = link_extractor.extract_links(response)   for link in links:  print(link.text, link.url)   if __name__ == '__main__':  cmdline.execute("scrapy crawl spider".split())