The skills you need, the resources to use and the steps to take to learn predictive analytics and data science from the perspective of a BI expert.
Data science, a continuum of the predictive analytics and data mining spaces, is an interdisciplinary science of supervised learning and applying engineering, business-systems, and statistical methods to the art of discovering patterns in data.
From the perspective of a business intelligence expert, this is complicated and overwhelming. While both areas follow similar methodologies and value by improving decision making and decision management, they differ in the skills, tools, and knowledge required. In terms that we understand, instead of reporting and doing analysis on the past, predictive analytics is forecasting the likelihood of future events.
The reason this is becoming increasingly popular is because we now have the hardware and processing power to compliment the statistics, theories, and expert knowledge we’ve had for a long time. This post will serve as a basic overview of what I know, what I’ve learned, what motivates me to learn more, and areas we need to grow as BI experts to move into this space.
Since data science is a science, not just data analysis, it relies on the scientific method. The scientific method is an iterative process focused around the ability to reproduce findings. It involves formulating a question, generating hypothesis, gathering data, testing hypothesis, and communicating results. For many of us, this is a familiar concept because of the iterative nature of the agile methodology we have all come to know and love.
The industry standard data science/data mining methodology is the Cross Industry Standard Process for Data Mining (CRISIP-DM). This methodology looks very familiar to anyone acquainted with the general agile methodology:
The methodologies described above, when put in the data science context, require many skills. A thorough knowledge of the business, data, and audience is needed to generate questions, formulate hypotheses, gather data, and effectively communicate results or take action. Equally as important, a strong technical and programming knowledge is required to prepare data and create models. And last but not least, a strong statistical and math understanding is vital to correctly analyze and test data to ensure that the models created are reproducible and significant.
Taking action on any model will only be beneficial if the right question was asked, the right hypothesis was tested, the right data was selected and prepared in the right way, and (arguably most importantly) the right methods used to test the model. See below for an excellent breakdown of the scientific method vs. data science skills. I also highly recommend reading the article Figure 2 comes from.
Many of the skills we have practiced as business intelligence experts lend themselves well to the data science realm, such as product design, programming, structured and unstructured data, big data, visual display of information, cloud management and general business development. From what I have found, if you’re ready to dive into the predictive world – there are a few skills you need to have in your tool-belt that fall outside the business intelligence tool belt:
|Programming & Technology||Math & Modeling||Statistics|
- Machine learning
- Bayesian Statistics
|- Data Mining
- Scientific Method and Experimental Design
- Statistical modeling
- Forecasting models
Table 1: A breakdown of the skills summarized in Figure 2 from a survey conducted of 490 data science professionals.
A couple months ago I was fortunate to be able to attend John Elder’s training on an Introduction to Data Mining and I have woven many of the concepts he presented into this blog. The diagram below helped me visualize not only data mining skills that he kept referencing but also the distribution of knowledge from each sector of the interdisciplinary approach.
Figure 3: Discipline Interlock
Source: Elder Research Introduction to Data Mining Training
1. UNDERSTAND THE METHODOLOGY
It’s really easy to get absorbed in data and models and tools. The CRISP-DM and/or other iterative methodology should be a constant reminder of the underlying question or hypothesis.
2. MAKE AN INTERDISCIPLINARY TEAM
Not all of us can have all the skills outlined in Table 1 (although I would not object to being a clairvoyant polymath!). Together we can create a good profile, see Figure 3 for a representation of areas of expertise that can be represented.
3. START WITH FREE TOOLS
Perhaps start with some of the open source and/or free tools and languages out there like R, Python, WEKA, and KNIME.
4. ATTEND AND PARTICIPATE IN CONFERENCES
Check out some of these upcoming ones listed below. There are so many!
5. PRACTICE, PRACTICE, PRACTICE, AND LEARN!
Way back in the day I worked at a coffee shop in college and for a project for one of my classes I wanted to use stochastic forecasting to determine the optimal brewing volumes for the new brewing equipment we had just received. The model forecasted coffee demand by volume accounting for the fluctuation of customers by time of day, day in the year, weather, etc. for about 2 years of data from the point of sales system.
The optimization portion of the project took into account the opportunity loss of a false negative (running out of coffee and giving a customer a more expensive drink for the price of a coffee) and a false positive (having too much coffee and dumping it out). As you may have noticed, the coffee industry is VERY particular about coffee. A few of the constraints of this problem: coffee could only be out for <2 hours and it took anywhere from 4-8 minutes to brew a pot of coffee (depending on the volume).
I delivered a recommendation of the 3 pre-set volume amounts for the new brewing equipment and they still use them to this day. If I were to do this again today now knowing more BI principles and learning more about predictive analytics, I would consolidate this information into not just a single deliverable but a model that continues to grow as their data, customer base, and demand changes.