Back in the days when Larry Page and Sergey Brin started a research project in their garage, the information universe was small. The world still had a lot of information, it is just that it wasn’t as easily accessible as it is today.
Unless there is a common trunk that can hold all the branches together and provide the necessary ecosystem for new branches to survive and evolve, the existing branches have little significance. We cannot hop from one branch to another to another if the Trunk is missing. That trunk of “information retrieval” was missing. Until they started crawling the web and organized the mechanism of Information retrieval and Information generation.
Imagine a student who has little knowledge on a subject and is trying to wiggle out his way from a college assignment. Today, he sits in front of his computer, plugs into the internet, and finishes the assignment before his coffee is cold. It wasn’t that easy 20 years ago.
It is certainly not that the world has more information today than what it had 20 years earlier. What has changed drastically is the way we retrieve and register information, our model of the transaction has changed, thanks to the systematic crawling, indexing, and organization of data by search engines and other websites (Facebook/Twitter).
The Evolution of Web Crawling
They built a robotic ant and called it a spider. They gave the spider a list of addresses and told the spider
“Look, visit these addresses, lick whatever you can, come back and tell us what you licked.”
The spider came back but this time, they told him,
“Thank you for telling us what you licked. Now visit these addresses one more time and find all the other addresses present in a specific address. Lick those additional addresses and come back to us.”
The spider came back and this time they told him to recursively follow Step 1 and then Step 2 for every address it can find through a new address on any given page.
This is the history of how the internet was crawled by a primitive spider. The spider evolved and graduated to become the backbone of a search engine that we use today e.g. Google.
The Evolution of the Internet of Things (IoT)
Modern internet took its birth. The world started getting smaller.
Since then, our world history has registered another three and a half decades and this time internet already has crawled out from our laptops to those devices which are an integral part of our daily life.
Today, a smart alarm clock goes off early of its scheduled time if there is a traffic jam. I can ask my google glass to find my favorite recipe or how tall the Eiffel tower is, without lifting a finger. Even, my dumb medicine container can tell me in a lashing tone that I have forgotten to take my pill at the right time.
“Hey, it’s not dumb anymore”.
It knows when to act as it gets the necessary commands from the web. These devices now administer every moment of our life and they know perfectly when to make their owners feel blessed by their sharpness. From hiring a cab to getting alarmed for an office meeting, we are wired by an invisible digital lash. It’s the internet.
No more do we live among dead things. It’s now the internet of things, where things talk to each other, where everything we know of is actively connected to everything.
Web Data Crawling is the Linchpin of IoT
Behind this plush smartness of IoT, there is a cardinal part of seamless data exchange between the web and our active devices. Thankfully, this endless data exchange or data feeding to every smart device across the planet, that lives on the internet, is possible because web crawlers are doing all the heavy lifting of collecting and indexing every bit of data from our latest updates.
Today morning, when you were glancing through your favorite newspaper on your tab with your smoking coffee, you hardly invested the least of your interest to that fact, that web crawlers searched, collected, and indexed the same and made it ready, before your query, to feed your tab.
Key steps, the whole process followed:
● You took your tab and tapped on your news app
● All app understood your query and processed the same on its server(or a general search on the web)
● Query language installed on that server made a comparison with its existing information to find what you are looking for
● There was a relevant match
● All Query languages pulled out the data and sent that to your tab
● The data processor on your tab turned that chunk of data into a well-organized newspaper
● Newspaper surfaced on your tab’s display
Before this whole process, the web crawler fished out that info from a sea of web pages, which is continuously gaining on its volume, on the web and indexed the same for your server to access it.
Every moment, this incident happening to every single web-connected smart device from every corner of this planet. Healthcare, transportation, education, society, finance, global transactions, business or else, our society is made of, just name it and it’s connected to the internet.
If the future is connectivity, which it is, active data exchange is the prime reason behind that and for active data exchange, web crawlers play the sole central nervous system of that process. Without the relentless endeavor them, the whole sphere of the internet of things will slump to nothing. It’s pretty sure that by the end of 2015 only active physical sensors, mothered by the internet of things, will populate 30% of total web traffic and the future looks even brighter.
Custom Management of Structured Data
First, there was SaaS (Software as a service). I have a business, I need to generate Invoices, I will just use a provider that creates those invoices for me, without having to reinvent the wheel all by myself. That is SaaS.
Then came PaaS (Platform as a service). I have a business. I want to sell things to customers, the physical market is too crowded and requires inventory and infrastructure. Let me find a Platform where I can sell my products easily (eBay, Alibaba, SnapDeal). That’s PaaS.
Up next is DaaS (Data as a service). I have a business, I want to sell things to customers, I need data and insights into my customer segment and I want to know what works and what does not work. There is Gold in Data and I want someone to churn and chew that data for me and give me actionable insights that I can leverage into my business.
On any given day on the internet, more than 2 Million blog posts are published and god only knows how many Facebook posts and Twitter tweets are posted. This humongous amount of information is a key metric for businesses to tap into emerging markets that have never been explored before.
Today, I can start a service, use Twitter, and sell my offering to people I don’t even know. Without having to open a shop and without having to meet anyone. Just sit there, build my thing, put it on the internet, and wait for people to come and buy it.
The key things I (as a business owner) need are:
● The access to information as to who might buy something I will sell
● Access to information as to who are my competitors
● Every access to information on how I am going to distribute my service
● All the access to information on how my customers will pay for the service.
The cloud of Information which I need is always expanding and becoming more unmanageable with time. This is because the information cloud is always expanding and becoming something else and before we can figure out key metrics from the cloud. It gets bigger and more complex. Consumer internet companies and Enterprises will eventually need a service provider that can simplify their data needs. Through continuous crawling mechanism which simplifies the data and presents it in a structured, more readable way.