Data mining is a concept of identifying patterns from the data, generated from your systems, or business, that helps you take better business decisions, by leaning on your data, by identifying for you- trends invisible to naked human eye as well as opportunities in data that can be exploited to benefit you and your business.
On the other hand, web mining is a process of performing data mining from the web. It is based on the extraction of various forms of data from the web and finding patterns in it. However, web content mining is no more restricted to algorithm-based patterns, but also, price comparisons, automated filling of forms, capturing product data, brand monitoring and more.
Both have their own issues and difficulties in implementation. Any big company collects loads of data, in different formats. However, for best results in data-mining, all the data from the various sources should be in such a format that they can be brought together and probabilistic models can be run on them, or some other interpretation can be made from the data as a whole. A single sheet of data has almost no value unless it has data, collected from other sources to back it up in today’s world.
Let us take up a single case study. Say you are a car maker and you provide a three years’ guarantee on your car parts. However, you are currently providing this guarantee to all your customers, irrespective of how rashly or not they drive, how they store their car, and how many kilometers they drive daily. This is not fair, in many ways. And there is a huge argument against it. You are also seeing a few car parts failing more than the others. But, since your inferences are not backed by concrete data, you are unable to take some action; because you can’t get into an argument with your supplier based on a hunch. You already have sensors in your cars in place, collecting data all the time, but this data is stored in a non-uniform format and also, data from cars in different cities are stored in local servers, and there is no single location where all the data can be accessed at the same time.
So, in this case, what you need to do first, is create a central repository, where data can be synced and stored from all remote repositories at periodic intervals, be it a day, or a week. Next, you need to convert the data into a structured format and then, normalize it, so that algorithms can make some sense of the data. Now you can try different machine learning algorithms such as random forest and deep neural networks- to find which parts fail a lot more than the others. Also, you can draw a conclusion as to whether driving and car-storage habits actually have an adverse effect on certain car parts. Based on the learning from these, you can add clauses in your guarantee for rash driving and bad storage, and you can also give your suppliers a limit- as to what percentage of the parts from each batch can go wrong within a stipulated period of time.
This is a great example of data mining and using it to benefit your business and move it in the positive direction, using a long-term, data backed solution.
Web Content Mining
Now coming to web content mining, your problem statement can actually be very varied. Let’s look at the common scenarios in which web content mining might come handy.
E-commerce websites have changed the market scenario completely, and the everyday customer no longer compares prices across various local stores but across various e-commerce platforms, with the click of a button. In such cut-throat, competitive battlefield, you might lose out on a customer in case an item on your website is marked at even a dollar higher than a competitor. Hence more and more websites are keeping a record of prices of various items, across competitive websites, so that they can price their products accordingly.
Having the best prices, doesn’t always mean that you will attract more customers. Once a platform makes a name for itself, for having a different level of customer service, as compared to the rest in that field, people will prefer it, irrespective of the price difference. Certain e-commerce websites are not the only e-commerce websites out there, nor the ones with the best prices at all times, yet you would blindly place your order from them, because you know that their customer service is the best and that there will be no delivery hassles. To maintain such good customer service ratings, hiring a thousand men is not the only way you can go about. You can instead make data work for you, by mining data from places like pages with web reviews, of your items, of your website, social forums, where your company name is mentioned, and more. You can mine all this data, and try to find the customers who are unhappy, establish a common cause or trend, and find a solution that would make more and more customers have a positive image of your brand. This extra effort, while leaning on data, will take you a long way.
Last but not the least, are data gathered from varying sources, on a subject, when you need to do a detailed survey on it, and find the accuracy of some claims, in case you need to make a big business decision. Why did I mention varying sources? Because numbers are not all today. Even the tiniest rumors have the heaviest of impact on the stock market. So, you need to be on the lookout.
After reading about both of the processes, you might find that both of them are very important for driving your business, in the twenty-first century, and in case you do not have a data-team currently, you should take the help of a service provider to start leveraging your data and keep your company afloat in the competition.