Using Graph Data to Detect Fraud

In this video Data Scientist Garrett Pedersen shares how graph data can transform your approach to fraud detection.

He delves into the advantages of graph analytics over traditional relational tables, showing how visualizing connections can make identifying fraud more intuitive and efficient. You’ll learn the basics of graph structures and see practical examples of how they simplify data analysis, making it easier to uncover complex fraud schemes and hidden patterns.

Find out how graph analytics can enhance your fraud detection toolkit and keep you one step ahead of fraudsters.

Curbing Fraud by Leveraging Analytics

With the power of analytics, organizations can prevent future fraud, and uncover past fraud that would have otherwise gone unnoticed. Analytic models are the best protection against financial fraud and are quickly becoming the industry standard.

Explore the Methods

Video Transcript

Importance of Adapting to Graph Thinking

“Defenders think in lists. Attackers think in graphs. So long as this is true, attackers will win.” This quote from John Lambert of the Microsoft Threat Intelligence Center highlights the importance that we need to change the way that we are solving problems in order to stay ahead of fraud and abuse. In this video we’re going to be talking about how we can use graph data in order to identify potential fraud within an employer’s childcare subsidy benefit.

Whiteboard image with the words: Using graph data to detect fraud."

Limitations of Relational Tables

And so before we talk about graph data, let’s first talk about the way that we’ve seen data before. And so we’re going to go over relational tables. Now, this is the way that we have seen data for a long time, and we’re all probably used to it. So we see tables here that have different rows and have different columns, identify different features, and we might have multiple tables that are all connected to one another through some kind of common key or similar feature.

We might see these as Excel tables, these SQL tables, and we can join information so that we can combine what we have in each of these tables to answer some simple questions. So like in this particular case, what address might the child of some employee live at? And so in this particular case, we see that if we wanted to answer that question, we’d have to join the children’s table with the addresses table on some common feature like employee ID that we have in common for both of those.

The problem is that can get quite complicated and expensive if the tables are very large. And so in order to simplify looking at how data is connected to one another, we can use what’s known as graph data, where the data are stored as nodes and edges that highlight how they’re connected. And what are nodes and edges?

Whiteboard image with comparison of relational charts and graph representations of data

The Basics of Graph Data

In this graph right here, the graph is just all of this information that we have stored here. And so the nodes of the graph are the individual entities. So you might think of this as like the rows that we see here. So we have the employer—employees like John and Sally and Tom. We have their associated children and the addresses that they live at as well.

The nodes are these circles, and that’s usually how we see them represented in a graph. How are they connected? They’re connected through these lines or these links known as edges. Edges can have strength and direction. We can see that John is the parent of Charlie. John lives at 123 Main Street, and John and Sally both work with one another.

Nodes and Edges: Building Blocks of Graphs

And so this information is much more clear than the relational tables about showing how all of these entities are related to one another. And one way that I like to remember how a graph is structured is that it’s similar to a sentence. Nodes are nouns here. So we have nodes are equivalent to nouns, and edges are equivalent to verbs.

So just like a sentence, nodes are connected to one another through a verb. So for example, John lives at 123 Main Street, or John is the parent of Susan. And so in all of these particular cases, you can think of when you’re trying to develop a graph or translate this data over to this type of data that your nodes are going to be your nouns and your edges are going to be verbs to describe the way that those are related.

Applying Graph Data to Fraud Detection

Now that we have a basic understanding of kind of what a graph is, let’s return to our earlier example right over here. And I’ve just done a subset of this data so that we can more easily see how these entities are related. And so here’s John, who’s the parent of Charlie. John lives at 123 Main Street.

Also, Jane is someone that lives at 123 Main Street and is the parent of Charlie. So these are two parents of the same individual and they want to use this childcare subsidy benefit because childcare is very expensive and they want to take advantage of that. So there might be an individual named Joe that lives at 1 Center Street and John might pay Joe and then Joe therefore might be related to Charlie by Joe watches Charlie or babysits or, you know, is present with Charlie during the day while parents are at work.

So this is what the relationship of all of these different entities should look like. This is what the graph structure should look like. Now, what might fraud look like in this particular case? Let’s say that instead of John paying Joe, let’s say John pays Jane and Jane watches Charlie.

Now just looking at the structure of the graph, we see that the individual that’s watching the child also lives at the same address and is also the parent. So this might be an abuse of the benefit, where instead of subsidizing expensive childcare. They might actually just be paying their spouse or sending that money from the employer. It’s basically a thousand dollar raise a month to their own household through just watching the child. So this might be an example of what fraud would look like in this particular case.

Conclusion

The lesson I want you all to take away from this video is that graphs are very powerful ways of storing data that can help us understand how entities are related to one another. And if you want to learn more about graphs, there’s a lot of information on Neo4j. You can learn a lot about Cypher that queries graph. There’s so much you can learn. So I hope that this got you a little bit excited about how to catch fraud.