Balancing AI Power with Human Insight: A Humorous Dive into Text Analytics

I had the recent pleasure of being a guest on the INFORMS Resoundingly Human podcast—a wonderful experience and a lively discussion. I enjoyed talking about my work in analytics, my comedy, and the fun blend between the two.

To inject a bit more humor, I played a cheeky little game: sneaking five chosen words into the conversation. These ranged from quirky and bizarre (“Salmon City”) to analytics focused (“target shuffling”). The mission? To later analyze their weirdness in the dialogue using text analytics. Mission accomplished! All five of the words were integrated naturally into the conversation!

This exercise, albeit fun, also shed light on an essential truth about the current state of AI and analytic tools. They’re powerful; they’re plentiful; they democratize access to sophisticated analysis techniques. But when it comes to meaningful insights, it’s more important than ever to have a thoughtful human driving the analysis.

I’ve got access to LLMs and pre-trained embeddings, so I can just do text analytics on my podcast “easter-eggs”. But…what, in detail, should I actually do? The specifics of how one analyzes data is vitally important.

I tried asking a few LLMs, and received unhelpful, vague answers. When I prompted specifically for concrete steps to take, the suggestions were flawed, didn’t acknowledge appropriate tradeoffs, and often misstated how I could interpret the results. So…. I had to actually think about those things myself!

The Setup

I wanted some quantifiable way to measure how “weird” it was for me to use my five random words.

Currently, LLMs are about predicting what text will come next given a series of text inputs. Finding out what would have been the most predictable (or normal) thing to say, should allow me to quantify how “weird” what I actually said was, by measuring the difference.

I used GPT-2 for this since it is much faster (and freer) than the state-of-the-art models. After extracting the full transcript of the podcast, I used a context window of 5 sentences to ask GPT-2 to predict the next sentence. I did this several times, coming up with a handful of sentences that would plausibly come next in the transcript. For instance, after I said, “In this space, we don’t like to solve the cookie cutter problems, the plug and play problems, we’d like to solve the complex and creative types of problems”, GPT-2 came up with a few plausible sentences that could come next, like “And we have a lot of very good data scientists”.

Naturally, that’s not what I said next, which was “And I’d love to get a chance to do that”. I next needed to quantify how “weird” it was, or how different it was from what GPT-2 predicted would come next. For this, I used SBERT, a popular sentence-level embedding tool. SBERT maps an entire sentence as a many-dimensional vector. With each sentence represented as a vector of numbers we can perform calculations easily. To quantify the “weirdness”, I had GPT-2 come up with 3 candidate sentences after each sentence in the podcast. Then, I compared the midpoint of those 3 sentence vectors to the vector of the actual sentence that came next. Since the absolute distance isn’t meaningful, I used relative distances to rank each sentence to find the “weirdest” – or furthest from the GPT-2 suggestions.

The Punchline

Turns out, the sentences containing my sneaked-in words landed comfortably in the middle of the weirdness distribution—neither sticking out awkwardly nor blending in too seamlessly. Just right to slip by!

The challenge term that scored as the “weirdest” was target shuffling, which is a bit surprising since that is a real analytics concept and was the most natural to actually appear. In conversation, the preceding context was “It’s a way to validate models and validate statistical findings, and it’s super powerful”. GPT-2 came up with a few sentences that should be likely to be next – and it thought my next sentence: “Target shuffling seems like it should be ripe for a joke”, was out of place. It scored in the 87th percentile of weirdness across the entire transcript.

The least weird challenge word that I snuck in was “bootstrap,” another important analytics technique. Talking about “bootstrapping an audience” for my data jokes was only in the 21st percentile of strange things I said during the episode.

I successfully snuck my random words past the bot, but this exploration was only a quick proof of concept. There were some issues with my methodology:

Short sentences are penalized.

The “weirdest” thing that I said in the podcast was talking about the joke book that I wrote with the sentence “It’s 101.” That revealed how many jokes are in the book, and it was very different from GPT-2’s wordy sentences.

Sentence embeddings define the context.

Perhaps it’d work better to focus on the level of a word instead of a sentence. In my experiment, I could’ve appended “Salmon City” to a perfectly reasonable sentence and the encoding vector would have changed very little.

I didn’t experiment to select hyperparameters.

What context length would’ve worked best? Temperature? Alternative embeddings?

I did no bounds testing.

I should try some pure nonsense sentences to ensure the methodology marks them as “weird”.

Final Thoughts

Fear not, data scientists: our craft is more needed than ever. AI tools are growing increasing capable, but that only means that nuanced decisions, critical thinking, and human judgement are becoming more critical.

Data science isn’t just about powerful tools; it’s about using tools wisely. It’s about knowing their limitations, crafting thoughtful questions, and understanding the context. So, while there may be a continued urge to “throw a problem at an LLM”, know that the data science community is more important than ever in shaping what analytics is actually performed.

The tools will continue to improve, and the datasets will grow ever larger, but the essence of data science will always be resoundingly human.

Balancing AI Power with Human Insight: A Humorous Dive into Text Analytics

Author:

Date Published:

The Setup

The Punchline

Short sentences are penalized.

Sentence embeddings define the context.

I didn’t experiment to select hyperparameters.

I did no bounds testing.

Final Thoughts

Balancing AI Power with Human Insight: A Humorous Dive into Text Analytics

Author:

Date Published:

The Setup

The Punchline

Short sentences are penalized.

Sentence embeddings define the context.

I didn’t experiment to select hyperparameters.

I did no bounds testing.

Final Thoughts

Additional Articles by Evan:

What is ChatGPT?

How Good Am I at Ping Pong?

Extracting Knowledge and Making Decisions with Data Science

Author

Evan Wimpey

Former Director of Analytics Strategy