A yellow arrow pointing to the right.
Fran Britschgi
Fran Britschgi
Solution Architect, AI & Data Science
No items found.
April 4, 2024

Unlocking the treasure trove: harnessing unstructured data with Sigma's text processing tools

April 4, 2024
Unlocking the treasure trove: harnessing unstructured data with Sigma's text processing tools


Unstructured data makes up a massive proportion of the information collected in our data-hungry world. For every structured data point such as a click, purchase, or sensor reading, you can count on that data point to exist within a much larger context of qualitative information. 

Consider a single purchase on an online commerce site. The structured information that we may be able to collect can be quite comprehensive, including details such as the price paid, the shipping details, and the product info. But the unstructured text information—such as the customer reviews the shopper was exposed to, or the advertising copy that originally hooked our shopper, or the descriptions of the purchased product itself—represents the bulk of an iceberg that is almost by definition being neglected in a traditional analytic setting. 

Because those data sources have not yet been parsed and categorized into clean and clear variables for exploratory analysis, they remain unstructured and most likely untapped. 

Today’s data platforms provide us with the ability to store a virtually limitless amount of these unstructured fields, but exploring them as an end user can be overly technical and inaccessible. 

Thankfully, Sigma’s out of the box text processing capabilities make it a breeze to connect directly to these unstructured data sources and start mining them for the gold within!

An overview of unstructured data

Sigma can connect to a variety of data. For example, here is a table in a fictitious dataset that has a set of three customer reviews.

A review of a product that was purchased.

This example contains three reviews with a rich set of details—not just about the product—but also their experience on the site and with the customer service. 

Working with all unstructured data comes with one giant challenge—there is no consistency in how information is conveyed. This makes it difficult to flag or analyze any patterns. 

Sigma makes working with unstructured data simple by highlighting how you can natively connect to unstructured text—and how you can analyze it with our generative AI and machine learning capabilities.

How to to connect to unstructured data in Sigma

Unstructured text is usually stored in a popular file format called JSON. JSON, which stands for JavaScript Object Notation, is a lightweight data interchange format. 

With JSON, data is structured into key-value pairs. 

If this is your first time hearing about JSON, it’s structured a bit like this:

{
 "reviews": [
   {
     "id": "1",
     "review": "You’ve got to try the chicken",
     "stars": "5"
   },
   {
     "id": "2",
     "review": "Service was okay",
     "stars": "3"
   },
   {
     "id": "3",
     "review": "It was raining when we went there",
     "stars": "1"
   }
 ]
}

The JSON file starts and ends with curly braces {} indicating an object. Inside this object, there's a key named "reviews" that maps to an array—or group—of review records. Each element within the employees array is an object representing a single review, with key/value pairs describing the review's id, the review, and total stars of the experience.

Sigma easily connects to JSON and parses it using the “Extract Columns” feature, and lets us select the fields we care about. This feature saves us from having to programmatically extract these values, giving us more time to explore the data itself.

A screenshot of a computer screen with a blue button that says

After fields are extracted from the data, your unstructured data can be analyzed just like all other data in Sigma.

But sometimes there’s more to unlock within the unstructured data—like from unstructured text within each cell of a the table.

How to analyze unstructured data in Sigma

Let’s look at three different ways Sigma can extract value from your unstructured text:

  1. Building your own dynamic text analyzer to find key information within the review using a feature in Sigma called an input table–which provides you with write-back capabilities to your data platform.
  2. Running a Machine Learning model that your Data Science team can maintain to analyze the sentiment of the review.
  3. Running Generative AI against your review to answer further questions about the nature of the review.

These methods are all designed with the end user in mind and should be accessible to people with even the smallest amount of Excel experience! 

Dynamic text analysis

First let’s analyze the actual object of the review itself. With Sigma, you can utilize a custom functions that can be built and maintained behind the scenes by your data teams. This will allow anyone to easily run an analysis without worrying about the syntax.

A screen shot of a word document with a search box on the top right corner.

For this example, a function was written to the products in a review. Without having to write complex function, the custom functions are saved so anyone can write a simple function—and in our example, identify products in the review.

A screen shot of a computer with a search box and a list of products.

ML sentiment analysis

Using Machine Learning to analyze the sentiment of unstructured text is a very popular method for deriving an actionable insight from customer feedback. 

To do this, natural language processing (NLP) algorithms extract relevant features from text, such as words, phrases, or context, followed by running a ML model which has been trained to classify subjective information into sentiment categories: positive, negative, or neutral.

In Sigma, you can have your Data Science team deploy such an algorithm within your data platform and then provide you with a simple, Excel-style formula that can make the hard stuff simple.

In this case the function is now called AnalyzeSentiment() which inputs the text and provides the three levels of sentiment.

A screenshot of a website with the text

Generative AI to parse the text

In Sigma, anyone can use governed, trusted generative AI models and make it possible to directly ask questions of your data. The benefit of this method is that you won’t be confined to static output categories like in the sentiment analysis—you’ll be able to iterate on the precise information you are looking for.

A screenshot of a table with a single column and a single row.

In the example, the prompt requests an action that anyone could take to improve in the future. The generative AI model of your choice will then analyze the request and return output in a format you can control. 

Summary

Businesses are increasingly turning their attention to unstructured text as a way to get an edge on the level of analytics that they can run on their operational data.

Working with unstructured data in Sigma is easy. You can connect to unstructured data, flatten it, and begin traditional analyses.

Or if you have more complex text, you can apply more advanced analyses through custom functions, machine learning models hosted in your data platform, or through the trusted generative AI platform of your choice.

Get in touch with us to see a demo or get started on a free trial.

WATCH THE PRODUCT LAUNCH

No items found.