YouTube has emerged as the clear winner amongst all other video-sharing websites, where the number of Youtube viewers is growing every day. While this video-watching platform is said to be valued at more than $160 billion. The number of people who are making a living through the website is also massive and so is their income for creating videos. These content creators join the YouTube partnership program and begin to monetize their content, which leads to making a ton of money through display advertisements and referral ads. YouTube data is resourceful for a wide range of use cases as listed below:
Listing down keywords
While you are running a search to find the top videos displayed on YouTube for some particular words. You will see a ton of informational videos in the search results. Where you can scrape data points like the likes, dislikes, views, and titles of each of those videos, you would be able to make a list of keywords that when inserted into your YouTube titles, can lead to better revenue.
By comparing likes and views on videos with a particular hashtag, you can get a better idea of which hashtags to use on your video to make it more popular or to understand the type of hashtags that could go better with the video title and even the content.
Finding popular channels
Extracting top videos on YouTube can help you create a frequency graph of the channel names that get displayed after running a search query. Thus enabling you to find the top channels that people enjoy watching. This process, in turn, will also help you to understand the kind of topics that are most popular among YouTube viewers.
Tracking channel popularity
By extracting the data of newly uploaded videos of a specific YouTube channel, you would be able to find whether a channel’s popularity is increasing or decreasing, or is stagnant. You can also find information about the videos that are leading the charts.
Recording videos views
You can create a graph with time on the x-axis and likes, dislikes, or views on the y-axis, by scraping data from those videos at regular time intervals. Since we had already explained the installation and initialization process in the previous “How to scrape data from wiki”, we hope you will be able to run the code using the python command by entering a YouTube video URL when prompted.
Using YouTube crawler code
As usual, we first begin with scraping the HTML code from the web page and saving it to a file in our local directory, so that we can analyze it and find the data points that can be extracted easily and would be valuable too. Most of the study for data points in the HTML page has to be done manually, by searching for specific keywords or values and finding where they occur.
Using BeautifulSoup (BS4) for extracting data points
The span element with class ‘yt-subscription-button-subscriber-count-branded-horizontal yt-subscriber-count’ is one from which you can extract the number of subscribers to the channel that has uploaded that particular video. While finding the hashtags associated with a given video is slightly more complicated than the other data points. First, you have to extract all the spans with class ‘standalone-collection-badge-renderer-text’, and from there one has to extract all the a-tags with class- ‘yt-uix-sessionlink’.
- The span type element having class as ‘watch-title’ is where you can find the title of the video.
- The script element that has a type of ‘application/ld+json’ contains the channel name.
- The div element with class watch-view-count would help you get the number of views of that particular video
- The button element with the title ‘I like this has the count of the number of likes on that particular video.
- The button element with the title ‘I dislike this’, has the count of the number of dislikes on a particular video.
By extracting the text in all the a-tags, into an array, you will be able to create a list of hashtags. This array can be added to the result JSON under a particular key called ‘HASH_TAGS’, in order to get the information in a structured format in the final result in JSON.
Data points you can scrape from Youtube
Using the python script and code, you can scrape certain data points from any YouTube video, as long as you possess their URL. Only the hashtags field may be absent in certain videos since it is not a compulsory field on YouTube video pages. The data points that can be scraped are as follows-
The most important data point is the one that we are extracting from the very beginning. The title of the video contains a lot of information, and is of utmost importance, without which all other data points would make no sense whatsoever.
Right after the title, the Channel name is important for associating the title with the creator. You can get details on who created the content. Especially on YouTube, videos are associated by their Channel names and not by their creators because in many cases, more than one person works on videos on a single channel.
Number of views
The simplest metric to understand a video’s reach is finding the number of views it has received. This is also the most important metric associated with a YouTube video and in many ways, it determines how much revenue the video creator will make.
The likes on a YouTube video is simply what percentage of the viewers liked the video enough to actually click on the thumbs up button below for a video. Similar to the above data point, the number of dislikes would determine the number of clicks on the dislike button for a video.
While likes, dislikes, and views paint a picture of the popularity of a single YouTube video, the number of subscriptions gives a finer idea of how popular the YouTube channel is. For YouTube channels, we have no other metric. The number of subscriptions is the only single data point and the higher it is, the more popular is the YouTube channel in question.
Hashtags have become a popular way of making your content searchable in different mediums. Be it Facebook posts or Instagram pictures, people are using hashtags with different types of online content today so that different types of content can be associated together. That is the reason why ‘trending hashtags’ is a thing today.
While the Python code can only extract some specific data points from a YouTube video page, exploring HTML pages from different YouTube pages can help you find more data points that occur under similar HTML elements. Web scraping has not given hard and fast rules since websites themselves keep changing. Hence, learning what data to scrape and how to scrape is something that can be gathered only from experience by scraping different web pages and having different formats of data.