How Content Discovery Platforms Can Fight Fake News via Web Scraping and AI
Gone are the days when people had to depend on the traditional media for news; now they are bombarded with news by a huge number of online media outlets on the internet. So much that it’s an information overload for the average person who has limited time to catch up on the news and stories. Social media now acts as a medium for news and it even makes the experience better for the users by customizing the feed to suit their reading habits. However, this massive proliferation of social media and web publishing comes with its own downsides.
The widespread availability of easy-to-use content management systems such as WordPress has made it easier for anyone to be a web publisher. This means, literally anyone can write and publish anything – no questions asked. It’s true that this has opened-up a wide range of possibilities for content publishing networks and bloggers. However, as is the case with all powerful things, the ready availability of publishing technology is being misused by a large group to spread fake news with hideous motives. Fake news is a bigger problem than what it appears to be on the surface. It has the potential to wreak havoc in the society and even affect businesses and other establishments negatively.
How big is the problem?
The proliferation of fake news is said to have tipped the scales in favor of Donald Trump in the recent American presidential election. Irrespective of the truth about the allegations, fake news can undoubtedly impact mass opinion in an unhealthy way, period. The spread of fake news can create mistrust in the society, which is a slow poison that can act as the root cause for many other social evils. For example, fake news could promote communal violence and create an unsettling atmosphere in the lives of people.
There are certain topics that can easily be materialized as fake news; abuse of power, fear of alienation, questions of war and peace, etc. can easily spread like wildfire, causing irreparable damage.
There have been instances of businesses running smear campaigns to bring their competitors down by spreading false rumors about the company with a target to ensure the affected company would lose customers.
It was quite recently that a Syrian refugee sued Facebook after fake news stories linking him to terrorism spread on the social network. Facebook later took down the posts, but the damage was already done.
Content discovery platforms and social media sites can themselves be in the danger of lawsuits if such issues keep happening. This would also affect the reputation of the content discovery platforms where such news spread, leading to decreased user engagement. With all this repercussions, fake news is a huge problem that needs to be nipped in the bud.
Can AI help?
Detecting and combating fake news is a challenging undertaking, no doubt about that. It’s certainly not a viable solution to employ humans to go through every post being shared on content discovery platforms for evaluating their authenticity. Gladly, we are no longer living in an era where humans need to do all the hard work.
Artificial intelligence has come a long way from the science fiction concept that it once used to be. We now have powerful voice, image and pattern recognition algorithms and the computing power to run them.
Combating fake news using artificial intelligence and machine learning would be the way to go considering the depth of this problem. To enable machines to detect fake news, we will first have to identify common characteristics of fake news posts. Let’s see how this can be achieved.
A website’s reputation is one of the key pointers that can be used to evaluate the authenticity of an article published on it. Google, the search engine giant does a great job at ranking webpages on their SERPs with respect to their reputation. Although we won’t be able to use Google’s proprietary algorithm to detect fake news, we could use many other websites’ ranking signals like the DA, Alexa rank and domain age to rank a webpage in our own fake news detection system. Older sites with a high Alexa rank are more likely to be trusted sources while the reverse may indicate a shallow website.
Natural language processing
Natural Language Processing, in its simplest definition, is the ability for a machine to truly understand human language and process it in the same way as a human does. NLP engines are built by feeding machine learning algorithms with text corpora. To truly detect fake news, machines must be able to interpret human languages just like we do. When it comes to fake news detection, the NLP engine must be fed with huge amounts of text data that belong to genuine articles as well as fake ones. From there, the fake news code can be cracked which will essentially enable machines to detect fake news with decent accuracy. Here are two things the algorithm can use to spot the fake news posts.
a) Internal consistency
Fake or misleading articles often have a great deal of inconsistency between different parts of the post itself; say title, body text, snippet etc. An NLP system can be used to scan and evaluate if the facts represented within an article are consistent throughout or conflicting.
b) Look for sensational words
Overly sensational articles often tend to be fake. A natural language processing system can be used to define the sensational aspect of the article from the use of sensational words in the news article.
The role of web scraping
An artificial intelligence engine that can detect fake news will obviously require huge amounts of data which would go into the training of the machine learning algorithm. Extracting data from the web shouldn’t be an issue considering there are advanced technologies that can be used for efficient web scraping. However, since detecting fake news is a challenge in itself, it’s recommended to use a data as a service (DaaS) solution like PromptCloud to acquire the data from the media outlets (both genuine and fake). Since we take end to end responsibility of the data extraction process, you can skip the complexities associated with web scraping and get ready-to-use data at a significantly lower cost compared to in-house scraping.
Adding a manual layer
As a machine identifies cues and flags the posts that it thinks are fake, a small human layer can be used to validate the findings. This will be easy now that all the heavy lifting has already been done by the AI system. With the manual layer in place, the system would be powerful enough to detect fake news with very high accuracy. For content discovery platforms and social media sites, having the ability to weed out fake news would prove essential to keep the users engaged as time goes by and users lose trust in the news spreading on such platforms. The potential of AI and web data extraction in this regard is immense and should be utilized to combat this evil at the earliest.