Submit Your Requirement

Download Web Data Acquisition Framework

Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!

Scroll down to discover

How to Effectively Scrape Content from Quora

September 2, 2020Category : Blog Marketing
How to Effectively Scrape Content from Quora

Scrape Content From Quora

When it comes to searching for answers, most people go to Quora, be it using their website, or the app, and type their questions. However, you can write your web-scraper to scrape Quora using Python by entering questions, and this is what we will be showing you today.

Where Is The Code to Scrape Quora?

Like a lot of other DIY web scraping articles that we have published before, we shall be using Python3.7 and the BeautifulSoup library to crawl Quora data and save it in a JSON file. Using this code you would be able to scrape and extract Quora answers and questions easily. The only other thing that you will be needing is a decent text editor. We have used PyCharm, which is a full-blown IDE, but you can also use Atom since it comes with multiple plugins and is more lightweight.

So to start with the code, we begin by importing the libraries that we will be needing, both internal and external. Once done, we need to make sure that we set the SSL certificate’s verify-mode to “CERT_NONE”, and check hostname to False, to avoid getting SSL certificate errors when we start scraping data. Once this is done, our setup is complete, and we can accept a question from the user. For this demo, we supplied the following value when this question was asked.



We create the Quora URL using this question. Does a simple string manipulation convert the question to this URL 

This string manipulation is required since Quora formats its URLs in this manner.

Once we have created the URL, we use the inbuilt Request function from urllib to hit the webpage and make sure that we add Firefox in the header, so that the website is not able to track that we are accessing it from a piece of code. This part is important since most websites block scrapers and if you miss the header. Your IP will likely be blocked, and further actions can be initiated against you. 

Scrape Content



Scrape Quora

After we have obtained the webpage in HTML format and stored it in a variable. We need to convert it to a BeautifulSoup object so that it is easier to parse and extract data from. Then extract the question on the webpage from the first “title” tag on the page. We need to remove “ – Quora” from it since all titles come with the following string. Scraping the answer is slightly more complicated. You need to extract the JSON stored in the element of type “script” having the value for “type” as “application/ld+json”. Once you have obtained this JSON, you shall find a list of answers with multiple fields. While few fields are given for each answer. We have extracted the most important ones-  

  • The date on which the answer was written.
  • The answer itself.
  • The number of upvotes that it received.

Once the data extraction is completed, we can append it to a list of answers and save the final list in a JSON file.

Understanding The Output

The JSON file given below contains some of the answers that were scraped from the HTML page when we ran the code with the question mentioned in the last section. As you can see, the JSON has two fields, the question, and the answers. Each answer consists of the three parameters that we mentioned earlier. While the number of answers scraped for this particular question were many. We have only shown a few of them below. Feel free to run the code yourself and check all the answers to this question, or any other.


The Limitations Of Scraping Content From Quora

While this might look like a perfect solution to finding the answers to any question on Quora. Like every other piece of DIY code, it comes with multiple limitations. One important aspect is that not every question you type will exist in Quora. You will have your code break every time you type a question that does not exist. At the same time, you might need to type your question multiple times to find which version exists. A better implementation would be to find the question that matches the one you entered closest.

Another aspect to consider is one related to the qualms of scraping Quora data and how you choose to use it. You need to make sure that you go through the robot.txt file and scrape data, and use it accordingly. Any commercial use of this code can lead you to legal issues. And using the data collected for anything other than research purposes may also cause problems.


Web Scraping Service CTA

Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content
Filter by Categories
eCommerce and Retail
Real Estate
Research and Consulting
Web Scraping

Get The Latest Updates

© Promptcloud 2009-2020 / All rights reserved.
To top