Text Analytics Spells “Big Savings”

Text analytics and natural language processing are extremely powerful concepts that are increasingly within organizations’ grasp. Many of the concepts for mining text to extract new information have existed since the mid-1980s, but with the rise of the data scientist the barrier of entry has been dramatically lowered. Before we talk about how text analytics might be useful to your organization, let’s establish a quick baseline of understanding.

What is Text Analytics?

Text analytics is roughly synonymous with text mining, and text data mining. Technically it is not related to biblio-wizardry or vocabu-sorcery but I’d still like to think there’s some magic left in the world. The whole idea behind text analytics is taking a body of text and extracting valuable, discrete, or new information. Think about your business, then think about how much of a paper trail there is: E-mails, contracts, invoices, industry publications, etc. Most organizations have an absolute mountain of text information that is likely providing little value right now, other than its original intended purpose.

(See More about turning data into insights /data-activation-when-your-data-hands-you-lemons/ )

What about Natural Language Processing?

Natural Language Processing is a subset of text analytics that deals with aspects of language such as identifying the parts of speech, disambiguation, sentiment analysis, and the other vagaries of human language that computers will soon be better at understanding than we are. Although I’m afraid that no amount of context clues can help me understand modern slang (https://thoughtcatalog.com/january-nelson/2018/09/millennial-slang/ ). I used to be cool, but now I’m just a data geek.

Text Analytics and Machine Learning

As you’d expect in the new frontier of data jiggery, there are quite a few different approaches to text analytics. Some of the more interesting approaches utilize machine learning to train a model on an existing corpus of text and apply that model to related text. Perhaps we’re looking to extract entities by identifying law firm names in a body of legal documents. Maybe we’re trying to measure a customer’s sentiment to a customer service call by identifying speech patterns and word choice. Maybe we’re trying to determine if two historical works are actually written by the same author, or if they’ve just been attributed to the same person. These are exciting use-cases, and I doubt you have to think hard before you come up with something applicable to your own organization.

A Real World Example

UDig is working with an association who publishes scholarly articles. Their ask is to improve their ability to use the abstracts of the works to automatically match new content with specific peer reviewers. A high-level explanation of our approach to tackling the challenge roughly follows.

First, we take the massive corpus of abstracts and do some simple pre-processing. We do things like remove stop words (“the”, “and”, etc) and stem words (i.e., change “monitoring” to “monitor”). Next, we calculate a metric called TF-IDF. TF-IDF (which stands for “Term frequency–inverse document frequency”) essentially counts the appearance of a particular word in a document and then penalizes the “score” for the word if it appears in many different documents. For example, the word “the” (if it weren’t already removed by our stop word elimination) would appear quite frequently in a single document; but because it appears numerous times in every document, it gets penalized to count for nothing. Conversely, if one article happens to be about “biblio-wizardry”, and only two other documents contain the terms “biblio-wizardry” we can start to assume those texts might be related; particularly as we assess other common terms across the documents.

In this case, ranking scholarly articles utilizing TF-IDF lets us get a pretty good idea of when two documents are related; and when two documents have little to do with each other. From there, we can take these terms and marry them up with peer-reviewers. If we discover that one person has a penchant for reviewing articles about “biblio-wizardry” but never touches the (frankly more profane) “vocabu-sorcery”, we know how to route new abstracts as they come in by applying the same technique.

How achievable is this?

The possibilities for text analytics are endless. While it can be challenging to extract the information and no text analytics project looks the same, I believe there is an absolute treasure trove of value to be discovered. From automating discrete data identification, to gaining a more holistic view of your customers, text analytics is worth investigating.

Digging In

Artificial Intelligence
Automating Discovery: Turning Requirements into Jira Stories with AI
When UDig was asked to explore ways to accelerate delivery, the brief was intentionally open-ended, inviting the team to rethink existing processes and challenge assumptions. One area quickly emerged as a clear opportunity: discovery. While essential, discovery can slow momentum when large volumes of requirements must be manually translated into user stories. Like most projects, […]
Read More
Artificial Intelligence
Generative BI: Building a Natural-Language Analytics Engine
Our recent exploration into generative analytics uncovered exciting possibilities for the future of business intelligence. We set out with a broad goal: to democratize analytics insights and eliminate bottlenecks by giving users a personal data analyst. The result was GenBI, an internal proof of concept demonstrating how large language models can sit on top of structured datasets, translate natural language into SQL, and generate accurate charts in […]
Read More
Artificial Intelligence
Agentic Commerce: Four Paths Retailers Can Take Right Now
With over 40% of shoppers saying AI is now their primary source of insight, today’s agentic commerce tools create unprecedented visibility into consumer purchase intent and decision-making patterns. Today’s AI agents excel at surfacing clear product data and creating frictionless shopping experiences. Smart retailers already recognize agentic commerce as a differentiation opportunity, and some major […]
Read More
Artificial Intelligence
From Experimentation to Enterprise: Making AI Adoption Real A Q&A with Josh Bartels, Chief Technology Officer
Everyone’s talking about AI, but how do you actually move from buzz to business impact? We sat down with UDig CTO Josh Bartels to break down what it really takes to move beyond experimentation and build meaningful, scalable adoption across the enterprise. Q: How can organizations move beyond experimentation and start realizing real value with […]
Read More
Artificial Intelligence
Paid Media Analyzer Prototype
Built during UDig’s internal Airwave program, this prototype delivers automated Google Ads intelligence that pinpoints what’s working and what’s not, freeing teams from manual reporting and boosting ROI through faster, data-driven decisions.
Read More
Artificial Intelligence
Generative BI Prototype
Built during UDig’s internal Airwave program, this prototype lets users explore enterprise data in plain language through a conversational interface that translates questions into SQL and instantly returns results as charts or insights.
Read More

Your Privacy

Text Analytics Spells “Big Savings”

What is Text Analytics?

What about Natural Language Processing?

Text Analytics and Machine Learning

A Real World Example

How achievable is this?

Digging In

Automating Discovery: Turning Requirements into Jira Stories with AI

Generative BI: Building a Natural-Language Analytics Engine

Agentic Commerce: Four Paths Retailers Can Take Right Now

From Experimentation to Enterprise: Making AI Adoption Real A Q&A with Josh Bartels, Chief Technology Officer

Paid Media Analyzer Prototype

Generative BI Prototype