What started initially as Rap-Genius in 2009 has now evolved into a unique music knowledge sharing media company called Genius, that serves more than a hundred million people each month. Most internet users who are into music must be familiar with this fast-growing website that goes into what’s behind the song, what the artist’s thoughts were when he or she was writing it down, and not just the lyrics. Even well-known artists like Ariana Grande and Kendrick Lamar collaborate with Genius to give the world a deeper insight into their art.
Genius has evolved into a music community, where contributors, musicians, and even editors come together to deconstruct songs and due to this reason, they have become the biggest music lyrics library in the world. Genius has also partnered with music streaming companies to increase their user base. While on one hand, it has joined hands with Apple to provide lyrics for Apple music services, it has also enabled Spotify users to use Genius to display lyrics for them when they play songs.
If anyone wants to scrape song-lyrics, no matter what their purpose be, Genius is the website to go to. Their database of over twenty-five million songs, albums, artists as well as annotations make them the biggest database of song lyrics, anywhere in the world.
Genius is one of the biggest names in the music industry today. But not everyone is trying to be that big. Some just want to gather some lyrics for certain reasons or some are focusing or creating a group on a certain genre of music. For all such persons, Genius is the place to scrape music-lyrics and other data from since not only would it give you a huge repository of lyrics but you would also have access to user comments- that is user sentiments. So you could predict what genre is currently a hit among new users and what type of songs are more in sync with the latest happenings.
When it comes to web scraping, few languages support different types of web scraping, but among them, Python is the easiest to learn and also comes handy for different types of Web Scraping projects. Along with Python, you will also need some third party Python packages as well as a text editor. You can follow the instructions on this page since the setup is the same, no matter which website you are scraping. Once you are done and you have set up your system, you can continue reading.
So we have discussed a lot about song lyrics, Genius, and the basic setup. So now lets you give a glimpse of the code before I show you how to run it and how the code works –
#!/usr/bin/python #- * -coding: utf - 8 - * - import urllib.request import urllib.parse import urllib.error from bs4 import BeautifulSoup import ssl import json import ast import os from urllib.request import Request, urlopen # For ignoring SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE # Input from user url = input('Enter Genius song lyrics Url- ') # Making the website believe that you are accessing it using a mozilla browser req = Request(url, headers = { 'User-Agent' : 'Mozilla/5.0' }) webpage = urlopen(req).read() # Creating a BeautifulSoup object of the html page for easy extraction of data. soup = BeautifulSoup(webpage, 'html.parser') html = soup.prettify('utf-8') song_json = {} song_json["Lyrics"] = []; song_json["Comments"] = []; #Extract Title of the song for title in soup.findAll('title'): song_json["Title"] = title.text.strip() # Extract the release date of the song for span in soup.findAll('span', attrs = {'class': 'metadata_unit-info metadata_unit-info--text_only'}): song_json["Release date"] = span.text.strip() # Extract the Comments on the song for div in soup.findAll('div', attrs = {'class': 'rich_text_formatting'}): comments = div.text.strip().split("\n") for comment in comments: if comment!="": song_json["Comments"].append(comment); #Extract the Lyrics of the song for div in soup.findAll('div', attrs = {'class': 'lyrics'}): song_json["Lyrics"].append(div.text.strip().split("\n")); #Save the json created with the file name as title + .json with open(song_json["Title"] + '.json', 'w') as outfile: json.dump(song_json, outfile, indent = 4, ensure_ascii = False) # Save the html content into an html file with name as title + .html with open(song_json["Title"] + '.html', 'wb') as file: file.write(html) print('----------Extraction of data is complete. Check json file.----------')
To run the code given above, all you need to do is save it in a file with the extension .py. So you could save it to a file with the name songLyricsExtractor.py for example, and then from the terminal run the command –
python songLyricsExtractor.py
When you run it, you will be prompted to enter a URL. This can be a link to any song in the Genius website. You can use this link for Lana Del Rey’s Looking for America. We have scraped this song’s lyrics from Genius, to show you how the code works in action, and that is covered in a latter subheading.
Before we go about running the code and understanding what it gives you, let’s try to understand the code itself. Like always, we are using BS4 (or Beautiful Soup), a library that makes parsing through an HTML page very simple and scraping data from it very easy. In the very beginning, we get the webpage and convert it into a Beautiful Soup object from where we can pick up divs, spans, titles, and other tags, with specific attributes. We use these techniques, to scrape the lyrics, the comments, the title of the webpage, as well as the day the song was actually released. Once we have scraped these data we save it in a JSON file, with the name- title of the page+.json extension. We also save the HTML file with the name title of the page+.html extension. This is done so that the HTML page can be analyzed and more data points can be found in the future.
On running the code and giving it the link of a song on Genius, you will get a JSON that will look something like the one given below. The one below is the JSON that we get for Looking for America by Lana Del Rey. We have only presented one JSON since they are so large in size but you can run it against your favorite songs as well, to extract the lyrics, save them, print them, or do anything you want with them.
{ "Lyrics": [ [ "[Verse 1]", "Took a trip to San Francisco", "All our friends said we would jive", "Didn't work, so I left for Fresno", "It was quite a scenic drive", "Pulled over to watch the children in the park", "We used to only worry for them after dark", "", "[Chorus]", "I'm still looking for my own version of America", "One without the gun, where the flag can freely fly", "No bombs in the sky, only fireworks when you and I collide", "It's just a dream I had in mind", "It's just a dream I had in mind", "It's just a dream I had in mind", "", "[Verse 2]", "I flew back to New York City", "Missed that Hudson River line", "Took a train up to Lake Placid", "That's another place and time, where", "I used to go to drive-ins and listen to the blues", "So many things that I think twice about before I do, no", "", "[Chorus]", "I'm still looking for my own version of America", "One without the gun, where the flag can freely fly", "No bombs in the sky, only fireworks when you and I collide", "It's just a dream I had in mind", "It's just a dream I had in mind", "It's just a dream I had in mind", "It's just a dream I had in mind" ] ], "Comments": [ [ ""Looking for America" is a song Lana wrote on August 5th, 2019 regarding the mass shootings throughout the US, once she got back to L.A. The message of the song relies upon Lana dreaming of a better situation for American people, a topic she has sung before in tracks like "Coachella - Woodstock in My Mind" and "When The World Was at War We Kept Dancing". Del Rey released the song via streaming platforms on August 9th, 2019 as a single.", "The title of the song might be a reference to "America" by Simon & Garfunkel:", "They've all come to look for America", "Del Rey took her Instagram to share a video of her singing in the studio with friend and producer Jack Antonoff." ], [ "Del Rey shared the song via Instagram adding:", "Hi folks came back early from Montecito with my brother this morning and asked Jack Antonoff to come into town because I had a song on my mind that I wanted to write. Now I know I'm not a politician and I'm not trying to be so excuse me for having an opinion- but in light of all of the mass shootings and the back to back shootings in the last couple of days which really affected me on a cellular level I just wanted to post this video that our engineer Laura took 20 minutes ago. I hope you like it. I'm singing love to the choruses I recorded this morning. I'm going to call it 'Looking for America'." ] ], "Title": "Lana Del Rey - Looking For America Lyrics | Genius Lyrics", "Release date": "August 9, 2019" }
While the solution we provided is for a single song, you could create a list of song-lyrics pages’ links on Genius and then run the code on the list iteratively. You could also find a regex match for the Genius pages that contain song lyrics, and then scrape multiple pages from Genius, at one go, such that your code itself recognizes pages that have lyrics.
However, these types of DIY solutions are good for the buddying hobbyist or one who has a one time need. In case your requirement is more commercial and you have a web scraping problem where you need features like-
Then you should go with a DaaS provider like PromptCloud. Our team at PromptCloud prides itself in providing enterprise-grade web-scraping solutions to business teams across the world to enable them to use data to within their business workflows, and make data-driven decisions.