Submit Your Requirement
Scroll down to discover

Scrape Amazon Product Reviews using Python

August 27, 2018Category : Blog
Scrape Amazon Product Reviews using Python

Sun Tzu said in The Art of War, “If you know the enemy and know yourself, you need not fear the result of a hundred battles.” This holds true for almost every business, especially the e-commerce business. You may you have the right idea, the right people, the right USP, the right products, and the right prices. Somehow, it is not enough. You also need to have enough competition data and insights, to determine whether your business model will be a success. This is the reason why almost any upcoming eCommerce company, first crawls Amazon to see how it measures against the Goliath of e-commerce.

Why to Scrape Amazon Product Data?

Apart from having huge and deep product categories, Amazon delivers to almost every corner of the world, and has thirteen country specific websites. Scraping product data on Amazon, can get you interesting product and business insights.

Automating this code with the help of a service provider, can let you crawl Amazon product data brand-wise and category-wise, and build your own database of products, before you set up your eCommerce shop. It will help you save considerable man hours and money if you are just starting your business.

Web Scraping Amazon Reviews Using Python

In the “How to extract hotel data from travel site” article, we showed you how to set up the web scraping environment. Just follow the steps if you are new to Python. Everything remains the same. Install Atom, Python, then use pip to install BeautifulSoup, and then copy and paste this program into the editor screen and save it with the name of amazon_data_extractor.py

In case you are having difficulty copying the code, you can also download it from here. You can download the file and open it in Atom.

[code language=”python”] #!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib.request
import urllib.parse
import urllib.error
from bs4 import BeautifulSoup
import ssl
import json

# For ignoring SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url=input(“Enter Amazon Product Url- “)
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, ‘html.parser’)
html = soup.prettify(‘utf-8’)
product_json = {}
# This block of code will help extract the Brand of the item
for divs in soup.findAll(‘div’, attrs={‘class’: ‘a-box-group’}):
try:
product_json[‘brand’] = divs[‘data-brand’] break
except:
pass
# This block of code will help extract the Prodcut Title of the item
for spans in soup.findAll(‘span’, attrs={‘id’: ‘productTitle’}):
name_of_product = spans.text.strip()
product_json[‘name’] = name_of_product
break
# This block of code will help extract the price of the item in dollars
for divs in soup.findAll(‘div’):
try:
price = str(divs[‘data-asin-price’])
product_json[‘price’] = ‘$’ + price
break
except:
pass
# This block of code will help extract the image of the item in dollars

for divs in soup.findAll(‘div’, attrs={‘id’: ‘rwImages_hidden’}):
for img_tag in divs.findAll(‘img’, attrs={‘style’: ‘display:none;’
}):
product_json[‘img-url’] = img_tag[‘src’] break
# This block of code will help extract the average star rating of the product
for i_tags in soup.findAll(‘i’,
attrs={‘data-hook’: ‘average-star-rating’}):
for spans in i_tags.findAll(‘span’, attrs={‘class’: ‘a-icon-alt’}):
product_json[‘star-rating’] = spans.text.strip()
break
# This block of code will help extract the number of customer reviews of the product
for spans in soup.findAll(‘span’, attrs={‘id’: ‘acrCustomerReviewText’
}):
if spans.text:
review_count = spans.text.strip()
product_json[‘customer-reviews-count’] = review_count
break
# This block of code will help extract top specifications and details of the product
product_json[‘details’] = [] for ul_tags in soup.findAll(‘ul’,
attrs={‘class’: ‘a-unordered-list a-vertical a-spacing-none’
}):
for li_tags in ul_tags.findAll(‘li’):
for spans in li_tags.findAll(‘span’,
attrs={‘class’: ‘a-list-item’}, text=True,
recursive=False):
product_json[‘details’].append(spans.text.strip())

# This block of code will help extract the short reviews of the product

product_json[‘short-reviews’] = [] for a_tags in soup.findAll(‘a’,
attrs={‘class’: ‘a-size-base a-link-normal review-title a-color-base a-text-bold’
}):
short_review = a_tags.text.strip()
product_json[‘short-reviews’].append(short_review)
# This block of code will help extract the long reviews of the product
product_json[‘long-reviews’] = [] for divs in soup.findAll(‘div’, attrs={‘data-hook’: ‘review-collapsed’
}):
long_review = divs.text.strip()
product_json[‘long-reviews’].append(long_review)
# Saving the scraped html file
with open(‘output_file.html’, ‘wb’) as file:
file.write(html)
# Saving the scraped data in json format
with open(‘product.json’, ‘w’) as outfile:
json.dump(product_json, outfile, indent=4)
print (‘———-Extraction of data is complete. Check json file.———-‘)

[/code]

What will This Web Scraping Program do?

Well, let me explain with an example. I will show you a product page on Amazon with reviews, and what the program returns, when the URLs are fed to the given program.

Let’s take this Dell Laptop on Amazon. When you run the program, it will print “Enter Amazon Product Url- ”

When it does so, just copy paste the URL given above. On doing so, the following JSON will be generated under the name of product.json in your current directory. Link to match with your JSON is generated.

[code language=”python”] {
“brand”: “Acer”,
“name”: “2018 Newest Acer 14-inch HD Chromebook LED Anti-glare Display, Intel Dual-Core Celeron 3855u 1.6GHz processor, 4GB RAM, 16GB SSD, HDMI, USB 3.0, Webcam, 802.11a Wifi, Bluetooth, Google Chrome OS”,
“price”: “$229.00”,
“img-url”: “https://images-na.ssl-images-amazon.com/images/I/41nlp137qeL._SX300_QL70_.jpg”,
“star-rating”: “4.2 out of 5 stars”,
“customer-reviews-count”: “79 customer reviews”,
“details”: [
“14\” Anti-Glare HD WLED Backlit (1366×768) Display with Acer ComfyView Technology, Built-in media reader”,
“Intel dual-core Skylake Celeron 3855U 1.60 GHz processor 2M Cache, Intel HD Graphics 510, Built-in HD webcam with microphone”,
“4GB LPDDR3 Memory, 16 GB eMMC Flash Memory, Built-in cloud support-easily save your files to your Google Drive account for secure access wherever you go”,
“High Speed 802.11a WiFi, Bluetooth, HDMI, 2x USB 3.0, 1x USB 3.1 Type-C, 1 x Headphone/Microphone Combo Jack”,
“Google Chrome OS, Up to 10 hours Battery life, Color: Black”
],
“short-reviews”: [
“Best and safer computer to surf the Web and Watch Videos of all kinds.”,
“… only had this for a couple days but I love it. I went from a macbook to this …”,
“Great for school work….and Netflix”,
“This is NOT a 2018 Newest Acer – manufacture date is 08/2016”,
“AWESOME PRODUCT…SIMPLE TO USE”,
“I love this laptop”,
“… it is the biggest screen – Mom seems to love it.”,
“easy to use”
],
“long-reviews”: [
“I already have an 11.5\” Acer Chromebook, bought few years ago and based on the success of this first one, I decided to go ahead and buy another one of a larger size. I just LOVE, LOVE my two Acer Chromebooks. In the past I made the mistake of buying regular Microsoft Windows type of computer, and few of them still have at home, but after discovering the Acer Chromebook, I must say, I wish I knew earlier about the superiority of a Chromebook over a Windows, Microsoft product….I do not want to open a Microsoft Windows type of computer war vs an Acer Chromebook here….I am just saying…if you are happy with your Microsoft Windows product, I am not criticizing your purchase, what I am saying is that having both products at home…I Love the the simplicity of the Acer Chromebook over the \”OTHERS\”….I am fully aware that a Microsoft Windows type of computer can do other things that a Chromebook cannot do, but if you are just surfing the web and consuming much web media, Facebook, or Yahoo Mail, YouTube, Netflix….Nothing beats the Acer Chromebook for reliability and speed. Acer Chromebook turns on at the speed of light, and no viruses of any kind to worry about. Also, it does not slow down when updating the system like Microsoft Windows constantly does….Yes, at 15.6 inches screen, it is indeed a little jewel adn the price was very good….Thank You Amazon. And Thank You ACER.”,
“I have only had this for a couple days but I love it. I went from a macbook to this and expected to be disappointed. I didn’t have to adjust any of the settings or anything with this laptop. I logged into my gmail and everything was perfect. So far I have no complaints. I can’t believe how inexpensive it was.”,
“Had it for over a month now. It’s fast, easy-to-use machine with a large screen. Bought it for my daughter to use for homework. Our school, like many use Google Docs for much of the work that is done online. A chromebook is all they need – no need for Microsoft! Of course my daughter likes the large screen for watching Netflix when the homework is completed!”,
“NOT a 2018 Chromebook as stated in the title. I was expecting a Chromebook manufactured in late 2017 or 2018. This particular Chromebook was manufactured in August of 2016. Very misleading title……definitely not the \”2018 Newest\”. The Chromebook does seem to be pretty decent though.”,
“I was tired of Windows 10 faulty updates, and needed a reliable computer. The Acer was recommended to me by several friends. I got it last week and love it. Easy to set up and understand, no heat from hard drive, quick, less hassle, nice quality, etc. I still have my Dell with Windows 10, but for right now this Acer Chromebook is my first choice. It takes a little while getting used to a keyboard with fewer keys, and more closely spaced together, but I have no other qualms about this Chromebook. I am learning more about the Chromebook daily, and appreciate its simplicity and no need for added security. Excellent value, and performance. I was pleased with the Acer’s price and features. Easy to hook up to Ethernet with adapter if needed.”,
“I love this laptop! While I am used to Macbooks, my beloved recently died. I was not on the market for another macbook (not by choice) and needed a new laptop quick. I did some quick research and saw good reviews on this laptop. I ordered on Amazon prime and received within 3 days. I’m impressed! Chomebook has made my life extremely easy. Everything is connected! Something about the keyboard makes it easy to type. I LOVE IT!”,
“A little heavy but it is the biggest screen – Mom seems to love it… no more desktop screen clutter… Thanks”,
“bought for an elderly relative. easy to use!”
] }
[/code]

You will also see that we have saved the scraped html page under the name of output_file.html in the same working directory. Here is the link to check how the scraped html doc for this particular product page. You can try scraping Amazon product data from this html using BeautifulSoup.

Things to Remember while Scraping Product Data

1. You will be getting the following things from an Amazon product page, on using this product scraper

    • a. Brand
    • b. Name
    • c. Price
    • d. Image url
    • e. Star rating
    • f. Number of customer reviews
    • e. Important details
    • f. Short reviews
    • g. Long reviews

Each data point will be present in the JSON (one or two might be missing if not present for a product, or if the scraper is unable to locate it). Each data point will also come with its respective label.

2. Well, we cannot guarantee that every product page URL will be processed smoothly with this code. This is because Amazon keeps changing its code regularly and also all items or all subcategories do not have the same HTML CSS formatting.

3. If you run this program multiple times, within a short span of time, you may encounter this error, HTTP error 503. It is a server side error. But what it means in this case, is that Amazon is blocking your attempts to crawl data. So when you scrape Amazon for professional use, it is always suggested, that you get help from web scraping services like PromptCloud. The web scraping services can set up a system for you, which will automatically crawl all the data that you need, so that you can focus on your business, without worrying about the data.

4. The program is valid only for www.amazon.com and has not been tested in any of its country specific websites.

Conclusion

We have new eCommerce sites setting shop almost every single day. Amongst such tough competition, sustenance and business profits become difficult in absence of competition insights. Amazon product data could be the benchmark and starting point for you. Web scraping Amazon can get you the product insights that will help you build and scale the right business strategy and product category.


Need help with extracting web data?

Get clean and ready-to-use data from websites for business applications through our web scraping services. Contact Us.

Disclaimer: The code provided in this tutorial is only for learning purposes. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code.

Leave a Reply

Your email address will not be published. Required fields are marked *

© Promptcloud 2009-2020 / All rights reserved.
To top