Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!
Trick or treat?
Yes, the time has come for all to dress up in Halloween costume, go trick-or-treating, and sit by the fireplace to discuss the horrific ghost stories that people have encountered since their childhood. That said, did you know that in the US, Halloween is the second highest commercial holiday in which total expenditure goes up to $9 billion? So, considering the love between the US and Halloween, it’d be interesting to dig deep and find out the spooky elements of the country. We’ll find out if your city falls in the list of haunted places and whether you should be a bit extra careful this Halloween.
For this study, we extracted data from a website called Shadow Lands (feeling spooked yet?) to build the data set. Not only does it list haunted locations in the US, but also mentions the history behind each place. Visitors of the site have an option to add their own haunted place in case it is missing! Using the data from the website, several data fields related to each location was captured. Here is the list:
Here is what we’re unraveling from the analysis:
The chart shows the top thirty cities according to the number of haunted places. We see that Los Angeles, San Antonio and Honolulu are at the top spots when it comes to haunted places.
It would be interesting to notice that Los Angeles has spooky locations with descriptions referring to the Hollywood twenty five times and Universal Studios twice. And the following locations in LA are also prevalent:
Be careful in these areas!
Of all the states, California tops (not that it is a surprise), but it is closely followed by Texas and Pennsylvania. In case you’d rather stay in a less spooky city with lower number of “incidents”, I would recommend Montana, Delaware and Alaska since they are the least haunted states.
The charts for haunted cities and states give a fair idea, but is there another way to visualize how the haunted places are spread across the US? That’s when a heatmap comes into play to give an idea of the density of the locations.
Clearly, the East Coast has denser clusters of haunted places in comparison to the West Coast (only epicenters like LA, San Francisco, and Seattle contribute to spookiness here). Apart from that we see the Southern US is more haunted than the Northwestern US.
Now, we will look at the most frequently used words in the description text of the data set. The following word cloud shows the top 300 terms:
It shows that words such as ‘night’, ‘people’, ‘old’, ‘see’, ‘house’, ‘ghost’, ‘room’, ‘building’, ‘room’, etc. are prevalent. Some of the interesting findings are the following:
Although we figured out the frequently used words, it’d be much more insightful if we could find out the relationship between the words used in the description text. Here we will focus on bi-grams (a pair of consecutive written units) and visualize the relationship via network graph.
This network graph shows some interesting connections. For instance, there is a cluster of words related to soldiers and civil war which means some of the haunted places have emerged from the death and destruction caused by civil war. The larger cluster at the bottom associates ghost with haunt, hunters and stories (which makes sense). We also see words such as shadowy, ghostly and dark are associated with figures, which is connected to walking. Check out how the word poltergeist (noisy ghost) is associated with paranormal activity! This is mostly because of the nature of the poltergeists — they are known to levitate objects and horrify people by pinching, hitting and tripping humans.
So, that was some fun use of analytics and data sourcing via web scraping. Now it’s time for you to carve a pumpkin and impress people by capturing the talking-points for Halloween party.
Have a spooktacular Halloween and may the holy ghost bless you!
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.