Submit Your Requirement

Download Web Data Acquisition Framework

Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!

Scroll down to discover

Standard Data Formats- Our Right to Data

September 4, 2012Category : Blog
Standard Data Formats- Our Right to Data

As we continue to eat, sleep and live www, it’s more than necessary to migrate all the real world human rights to virtuality as well. Introducing “Right to Data” by means of which all humans on the web are entitled to accessing publicly available data in an automated manner so that their machines crib less. OK, that was definitely a made-up definition and just to clarify this right is only fantasized in the minds of PromptClouders. But what if all the webmasters did pay respect to microformats (μF)? In the interest of those of us who do not deal with large-scale data on the web, μF is an effort towards standardizing data formats across all sources on the web so that machines can process it more easily, without sacrificing readability for end-users. 

Real use of data comes in only when it can be automatically processed by a machine.

There are already semantic markups for contact info, Geo details, etc. which help search engines or crawlers better identify the data within. But the problem- a standard is not a standard until everyone uses it. Micro-formats need to expand to all verticals so that data from all domains could be dug out of their archaeological deposits. Let’s prove this by contradiction. Say, a company is interested in collecting reviews of a single product from 4 different sources on the web. What do you expect here? Write a script and ask your machine to do it independently? Unfortunately not. You have to intervene for every source of interest since each of these sites wrote their HTML’s as they pleased. Ultimately, you’d end up with 4 different set ups to extract the required information. Hence proved. 

Instead, if reviews on all the pages across sources looked like this- 

data-api-assembly line

You get it! There’s an obvious advantage of uniformity (and its derivatives) when using data formats. Extracting large-scale data becomes much easier and purported use of data is visible to the naked eyes. More so, data transfer between two points on the web becomes extremely smooth. Besides, a lot of covert information like legal data or other government records are within reach with the use of open data formats. What about mashups? I see a lot of them coming in without much hassle and being thrown into public use, if data formats were standardized. And hey! you get better search results even when you’re just Googling. 

If your intentions on the web are social, then you are surely at an advantage by using μF in your HTML’s. Just like any other right to information or right to education, right to data makes sense for those in the Big Data world. And why shouldn’t it when data is surging to be a magic wand for all businesses. 

Web Scraping Service CTA

Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content
Filter by Categories
eCommerce and Retail
Real Estate
Research and Consulting
Web Scraping

Get The Latest Updates

© Promptcloud 2009-2020 / All rights reserved.
To top