Several data exploration and visualization tool vendors make promises of instantly converting company-data to insights. Unfortunately this promise remains unfulfilled and there are several reasons for this. Firstly, all correlations are not insights! You need trained human eyes to determine the signal from the noise and tools can only be an aid. Secondly, enterprises complain about the deluge of information the tools provide without necessarily pointing out the one or two steps an organization should take for a noticeable business impact. And thirdly, humans have biases. Relying on just one or two opinions or perspectives from even the best data scientists can sometimes not be sufficient. You need to have more certainty before betting big dollars on their recommendations.
These issues often lead to disappointment and inability to extract the expected returns from the huge investments in big data tools.
Like it or not, there is no magic wand however there are few clear steps that if taken can lead to success each time.
Step 1: Clearly define business goals and translate them into specific data analysis requests
Sounds simple but this is where most data analysis projects fail. Data scientists and business executives speak very different languages and time needs to be invested in clearly outlining the business goals and mapping each goal to a specific data modeling exercise. The data scientists and the business executives need to agree on the outcome of the exercise before any data analysis activity begins.
Step 2: Identify, transform and integrate company-owned and externally available data sources to correctly represent your industry dynamics and the business goal
You've heard this before. Your data resides in silos and it's not of much use unless they are transformed and combined correctly. Not all attributes might be relevant to an analysis and the activity of picking the right attributes is part art and part science requiring reliance on human judgment.
Also, most company specific data holds limited information and unless you lead your market with 50% or more share, the dataset needs to incorporate broader business indicators relevant to your industry. This requires sifting through hundreds of publicly available data sources and extracting the attributes that matter. Errors during the modeling step can be significantly reduced if the "attributes"( synonymous with "features", "business drivers", "data indicators", "factors") selected during this step are exhaustive and accurate. Ideally, multiple feature sets need to be created and tested on different data models for the most reliable outcomes.
Step 3: Apply multiple data analysis approaches and triangulate the top business drivers
Many problems can be solved using a simple regression analysis if steps 1 and 2 are executed well. However, to know if a regression model is sufficient, you need to test multiple approaches and on a variety of feature sets. Based on experience of executing several projects using the crowdsourcing approach, we are convinced that testing multiple feature sets and model combinations is essential to determining the most reliable and accurate approach and to avoid introducing conformation biases into the analysis.
Step 4: Perform business sanity checks on the analysis and tune the data models further
Statistical and machine learning models merely identify the most important business drivers that impact your business.
To narrow these drivers further, two types of sanity checks need to be performed by business executives:
1. Check for spurious and meaningless correlations and remove them
2. Check which of the drivers identified by the model can be controlled by the company to impact future business scenarios. For example, "average smartphone prices" may be an important indicator of business success but you might not have a lot of control over that factor. Such a factor may need to be eliminated as they would not lead to actionable recommendations. At the end of this step, in most cases, you will be left with 5-10 business drivers that really matter.
Step 5: Integrate the models into an interactive dashboard and tune the same for a variety of future scenarios
Once the top 5-10 drivers have been identified, a recommendation always takes the same form - What can I do to impact one of the top drivers to drive business success? These recommendations can then be integrated into strategic plans and the outcomes can be fed back into the models to make this into an iteratively improving model.
By following these steps, organizations can make actionable decision making.
- Divyabh Mishra is Founder & CEO of CrowdANALYTIX, a crowdsourced analytics service that helps global clients unleash the potential hidden within their data. On CrowdANALYTIX, you can launch a project for free and gain from the collective wisdom of a global community of data scientists. You can read more about this here - www.crowdanalytix.com/data-to-insights