Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Skip to main content

Demystifying Data Science | Machine Learning

Demystifying Data Science | Machine Learning
Back to insights

What is it and why is it so hard?

My goal in this blog is to back away from the hype and peel back the curtain surrounding Machine Learning (ML) and define what it is and what it does for our world. I hope to lower the barrier to entry and the intimidation factor for the “data analysis newbie” and or the “data scientist wannabe” to start exploring the exciting world of ML.

ML is an evolutionary discipline that grew out of the data mining space. ML is intended to give us real insight into the reasons behind our successes or failures by the analysis of our data. It helps us understand our customers, products, mission and plan in ways that simply weren’t possible or practical before.  ML is not a panacea for making sense of all your big data. It is not a magic wand to wave at your datasets to pop out intuitive insights that lead to the proverbial pot of gold profit margins for your business. Nor will it ever be that.  ML is not perfect or exact.

Traditional data analysis is concerned with finding the answer to the “known questions” the business routinely asks. An example is total profit made in Q3 of 2017 in the western region of North America. This kind of question can usually be answered using traditional business intelligence technologies.

By contrast ML is meant to give us several highly likely or possible answers about our data to the questions we’ve not thought to ask. ML uses algorithmic based approaches for extracting insights and knowledge from data as it exists at a point in time. More specifically, it is leveraging algorithmic analysis methodologies to explore large amounts of data in search of meaningful patterns and implied rules. As such it is not permanent nor enduring. It is more akin to weather forecasting than exact sciences such as chemistry or physics.

The patterns discovered in the data are only meaningful to us if they give us actionable insights.

ML cartoon

What ML truly represents, in my opinion, is a paradigm shift. With the commoditization of big data and the cloud infrastructures that made that possible, acquiring and analyzing large data-sets is no longer the herculean effort it was just three or four years ago. Huge amounts of data are readily available for collection, storage, and analysis. The problem has shifted from how do we get the data to how do we use the data we’ve got?

Roles of a Machine Learning Expert

To quote Tracy Teal; co-founder and the Executive Director of Data Carpentry, “before now the conversation has been around bringing compute to data or data to compute, but with ML we are bringing people and understanding to the data and that becomes knowledge.” (1) As great and empowering as this sounds the people part of this quote is both the best part of ML and its Achilles heel. This is the reason there is so much hype as well as so much confusion around ML and why it seems so daunting. Occupying a space within the field of data science and as a subset of artificial intelligence – ML lies at the intersection of mathematics, statistics and computer science. It requires a combination of skill sets that have traditionally been found in separate job descriptions and therefore typically requires separate professionals.

Subject Matter Expertise: It’s all about the data. You need to first understand your data. You need to understand the implications of poor data quality on your ML models and understand the data well enough to know how changes in the various attributes or features of the data effect your organization’s mission or business model. Essentially, you must be a Subject Matter Expert (SME) of your data. This role was traditionally held by the data expert in the business’ marketing, inventory, supply chain or financial departments, but that person is usually not a data analyst by vocation.

Data Analysis: Data Analyst skills are needed as well as comfortable working with the software tools and techniques necessary to acquire, load, cleanse and finally analyze the applicable data sets.

Mathematical/Statistical Analysis: Mathematics, specifically statistical analysis, skills are also needed to choose the most appropriate mathematical algorithm for the given dataset and the type of analysis for the problem you’re trying to solve.

IT Infrastructure: You need to understand your IT environment and infrastructure well enough to quickly configure and deploy the necessary server resources and software to quickly standup a new ML “laboratory” for testing and improving your ML model with live or real data.

Computer Science/Programming: You need to be comfortable in statistical analysis programming languages such as R and leveraging its analysis libraries in languages such a C, C# and Python.

ML Cartoon

I just described to you the role of no less than five separate professionals and perhaps as many as ten, yet a data scientist or an ML expert is expected to not only possess all these skills, he or she is expected to be a master in all of these skills. This combination of skill sets at a mastery level rarely exist in one person nor is it realistic for a “data scientist wannabe” to acquire a mastery level of these skills while maintaining their ‘9 to 5’. Now you can start to see why the barrier to entry for data science and ML is perceived to be so high. The good news is that both employers and software vendors are starting to realize that finding these existing skills sets in one person is nearly impossible, so a better strategy is to both leverage existing strengths of these separate roles and offer tools and training to bridge the gaps in the requisite skills of existing professionals. Grow a data scientist versus hiring one. The software vendor’s approach to bridging the skill-set gaps via training and software tools will be discussed more closely in my next blog entry.

Sources:
(1) https://www.youtube.com/watch?v=xMmpMXlSzW0

Images:
http://analyticscube.com/2017/07/
http://bigdata-madesimple.com/dilberts-20-funniest-cartoons-on-big-data/

Digging In

  • Artificial Intelligence

    The State of AI: Building Trust and Aligning Strategy to Drive Adoption and Impact

    If you’ve been in a room with technology leaders lately, you’ve probably heard a lot of excitement – and a lot of frustration – about AI. Artificial intelligence has moved rapidly from a conceptual tool to a C-suite priority that offers boundless potential, but implementation remains a messy, human process. The truth is, we’re all […]

  • Artificial Intelligence

    Can You Shortcut Testing to Expedite Your Digital Roadmap?

    Slow testing cycles are the silent blockers to your product roadmap – it’s time for a change. AI-enabled automated testing can be a force multiplier as businesses look to increase the speed of digital transformation. In this article, we will cover: The Challenge: Complexities in Testing The AI-Driven Solution Innovations of AI-Driven Test Automation Real-World […]

  • Artificial Intelligence

    Transforming the Tractor Supply Store Experience: AI’s Role in Modern Retail

    Join us for a fireside conversation on how AI is reshaping the in-store experience at Tractor Supply. Business and technology leaders will explore the real-world impact of AI across retail—unpacking practical use cases, leadership insights, and future possibilities.

  • Artificial Intelligence

    Unlocking Your Hidden Goldmine of Information: The Power of Document Intelligence

    Did you know you are already sitting on a hidden goldmine of information that can deliver powerful, actionable insights? Here’s a truth bomb: a mountain of knowledge – and vast untapped potential – resides in a wellspring, far below the surface of your organization. Every text document, contract, report, policy, email, or manual contains critical […]

  • Artificial Intelligence

    Building a Multi-Model LLM Chatbot with Azure OpenAI and Amazon Bedrock

    This video will explore the journey of the creation of a Multi-Model LLM Chatbot that utilizes both Azure OpenAI and Amazon Bedrock.

  • Artificial Intelligence

    Is Reporting Dead? The Shift to Actionable Insights with Agentic AI

    Traditional reporting has been a cornerstone of business operations for decades—but is it really driving meaningful change?