Python-爬取淘宝搜索结果

前言

我们如果想买一个东西,可以去淘宝搜索,然后选取心仪的商品存到购物车,然后做综合对比;但是对于程序员来说我们可以把搜索结果爬取下来,爬取解析店铺评分、商品价格、店铺等级、购买人数等等,综合比较,还可以生成通过Python的报表库自动化生成报表,比如Matplotlib;然后再买!???当然不可能的,谁买个东西还会这么傻逼?,这里主要做学习使用!

搜索结果

 

爬取结果

详细代码

详细过程已经写在注释里面了?

先爬取网页的HTML

# 爬取函数
def start_search_result(keys):

	# 创建浏览器对象
	browner = webdriver.Chrome()
	browner.get('https://www.taobao.com/')

	# 给搜索框赋值
	kw = browner.find_element_by_id("q")#q
	kw.send_keys(keys)

	# 模拟点击搜索按钮事件
	iconfont = browner.find_element_by_class_name('search-button')
	iconfont.click()

	# 滑动至浏览器下端
	browner.execute_script("window.scrollTo(0, document.body.scrollHeight);")
	time.sleep(5)

	# 获取网页源码
	html = browner.page_source

	# 获取HTML数据
	html_etree = etree.HTML(html)
	
	return html_etree

解析爬取的结果

def analysis_html(html_etree):
	# 获取商品列表
	goods_list = html_etree.xpath('//div[@class="items"]/div')

	#存储字典的列表
	goods_map_list = []

	for k_value in goods_list:
		# 获取图片地址
		pic_src = "https:" + k_value.xpath('./div[@class="pic-box J_MouseEneterLeave J_PicBox"]/div[@class="pic-box-inner"]/div[@class="pic"]/a/img/@src', stream=True)[0]

		# 获取价格
		price = "¥" + k_value.xpath('./div[2]/div[1]/div[@class="price g_price g_price-highlight"]/strong/text()', stream=True)[0]

		# 获取标题
		title_list = k_value.xpath('./div[2]/div[2]/a/text()', stream=True)
		title = "".join(title_list).strip().replace("\n", "")
		
		# 获取店铺名
		shop_elem = k_value.xpath('./div[2]/div[3]/div[@class="shop"]/a/text()', stream=True)
		shop_name = "".join(shop_elem).replace("\n", "").strip()
		if(len(shop_name) == 0):
			shop_elem = k_value.xpath('./div[2]/div[3]/div[@class="shop"]/a/span[2]/text()', stream=True)
			shop_name = "".join(shop_elem).replace("\n", "").strip()

		infor_map = {}
		infor_map['pic'] = pic_src
		infor_map['price'] = price
		infor_map['title'] = title
		infor_map['shop'] = shop_name
		goods_map_list.append(infor_map)

	return goods_map_list

调取

#主函数
def main(search_key):

	# 先获取爬取的HTML文本
	html_etree = start_search_result(search_key)

	# 解析HTML,将商品列表存到list里面
	goods_list = analysis_html(html_etree)

	for item in goods_list:
		print("标题:" + item["title"])
		print("价格:" + item["price"])
		print("图片:" + item["pic"])
		print("店铺:" + item["shop"])

		print('-' * 100)

输入你想要搜索的东西就好,比如我想要搜索“iPhone”:

#开始搜索iPhone并爬取相关结果
main("iPhone")

头文件

import time
from selenium import webdriver
from lxml import html

etree = html.etree

Demo

DEMO: https://download.csdn.net/download/u014220518/11257850

结束语

欢迎各位大神提出宝贵的意见和建议,也欢迎大家进群交流365152048!

 

 

©️2020 CSDN 皮肤主题: 成长之路 设计师:Amelia_0503 返回首页