Extracting Spotify Data
Spotify, a Swedish music streaming and media company that set shop on 7th October 2008, is a household name today. While you may look at Spotify as only a streaming platform, it is a boon for developers who want to build services on top of music data. It exposes its APIs to developers and one can even submit an application built on top of Spotify to get it published by them. Today we will be showing you how to go about extracting songs data of music tracks from Spotify using the Spotify library in Python.
Where Is The Code Scraping Spotify?
Other than what we usually use for extracting data from websites, today we will also require Spotify, which is a lightweight Python library for Spotify Web API. On top of this, you must generate Client Credentials by going to this link. You will require two values-
We have stored these in a separate file, and imported it into our code, for security purposes.
Coming to the code itself, once you have made the required imports, you need to write a few functions that are required to extract data. . But first, you have to create an object of the Spotify class using the credentials that you obtained from the developer page of Spotify. The first one is get_track_ids – which will be used to return all the track ids for a given playlist id.
You can use the sp.playlist function to get the ids. But they will be present in a tree-like format, so you will need to go into the JSON. So as to extract only the ids. The ids appended to an array and returned.
The second function that we have written is get_track_data. This function takes in the id of a single track as input and will be returning certain data points related to it as output (in a JSON format).
The sp.track can easily be used to fetch all the data points related to a track that Spotify exposes to developers, bypassing the tracking id. Now you must extract the data points that you need, and manipulate them, to fit your requirements.
Once you have these two functions ready, you can accept a playlist id. You can extract the playlist id from the URL of a playlist. It is an alphanumeric string that can look like this – “6SklPNt6XKJRW5ZFMTxxE6”. Once you have entered the playlist id. We extract all the track ids using the function we wrote before and print the ids as well as the number of ids we extracted (which should be equal to the number of songs on your playlist).
After this, we loop over the track id list and extract the data points for each. One thing to note is that we are using the sleep functionality to give a small gap between the extraction of data points for each track.
This was done so that we do not make too many hits on Spotify together, and end up getting blocked. The data-points extracted for each song put in a JSON and appended to a list that is finally saved in a file for your usage.
Understanding The Output:
The output of this DIY code is pretty simple. You can see that we have extracted the following data points for each song-
- Release Date
- Duration in minutes
Of these data points, only the duration had to process since it comes in milliseconds. So we converted it to minutes and rounded it off to two decimal places to make it more consumable. Our playlist had around 50 songs, so we got a list of 50 such JSON blocks, but we have shown just a few here for your understanding. Feel free to create your own playlists with hundreds of songs and extract their data.
As more and more top websites allow developer support. It will be easier for the open-source community to build apps and features on top of popular websites. At present, many sites like Instagram and Twitter also provide API access to developers after taking certain information from them. While websites that provide dev-access do make lives easier for all of us. Others need web scraping services to get a hand on their data. And while web scraping does give you more flexibility in terms of what data you want. And how you want it, it is twice as difficult as compared to using APIs.