Mining Your Own Business Podcast

Season 2 | Episode 7 - The Art of Data Storytelling at PayPal with Gulrez Khan

Quote: “So you become a change agent—not just do what has been asked of you but understand the business and say, ‘Hey, you know what, there is another way we can do it.’”

Tune in as host Evan Wimpey chats with PayPal Data Science Lead Gulrez Khan about the art of data storytelling. With over two decades of experience, Gulrez has discovered how data storytelling can be an effective tool to bring clarity to the complex.

In this episode he shares lessons he’s learned and creative ways he’s helping children and adults build data literacy.

In this episode you will learn:

  • The importance of truly understanding a business problem before starting a data science project
  • How data storytelling can help bridge the gap between business stakeholders and data science teams
  • The creative approach Gulrez Khan is using to help both children and adults grow in data literacy
  • How data scientists can become agents of change by not only fulfilling requests but also identifying alternative solutions based on their domain knowledge

Learn more about why we created the Mining Your Own Business podcast.

About This Episode's Participants

Gulrez Khan headshotGulrez Khan | Guest

With almost two decades of experience under his belt, Gulrez Khan is a data-wrangling extraordinaire, who has a knack for turning boring numbers into captivating stories. When he’s not busy working, you can find him digging through open datasets like they’re buried treasure or passing on his data visualization skills to the next generation in the hopes of creating a world of data literate children.

A strong believer in the power of data literacy, Gulrez is on a mission to improve the way people make sense of data. He’s known for delivering corporate workshops that are equal parts informative and entertaining and has recently written “Drawing Data with Kids,” a book to cultivate data literacy in children and adults alike.

Follow Gulrez on LinkedIn

Photo of Evan WimpeyEvan Wimpey | Host

Evan Wimpey is the Director of Analytics Strategy at Elder Research where he works with organizations to transform deficient data into tangible business value that advances their mission.

He is uniquely suited for this challenge by pairing his professional experience in management and economics at high-functioning organizations like the Marine Corps and Goldman Sachs with his technical prowess in data science. His analytics skillset was strengthened while earning his MS in Analytics from the Institute for Advanced Analytics at NC State University.

Evan almost always has a smile on his face, which is at its widest when he is helping organizations use data in innovative ways to solve complex problems. He is also, in a strictly technical sense, a “professional” comedian.

Follow Evan on LinkedIn

Key Moments from This Episode

00:00 Introduction
01:09 Working as a data science leader at PayPal
02:17 How and why PayPal started a data science community
03:41 Creating sparks in data science projects
09:08 What it means to become a change agent
09:55 Discussing the principles of effective data storytelling.
11:27 How data storytelling facilitates better communication
15:02 Data storytelling as a marketable job skill
18:09 Gulrez’s book, “Drawing Data with Kids,” and data literacy
24:45 Using data for good
25:59 Wrapping up the show

Show Transcript

Evan Wimpey: Hello and welcome to the Mining Your Own Business podcast. I am your host, Evan Wimpey. And today I’m very excited to introduce Gulrez Khan to the show. Gulrez is a data science lead at PayPal. He’s previously been at Microsoft. And he is also the author of a very interesting book, “Drawing Data with Kids,” which we will certainly get into in today’s chat.

Gulrez, thanks so much for joining us on the show today.

Gulrez Khan: Thank you for having me. It’s totally a pleasure of mine.

Evan Wimpey: Oh, fantastic. Gulrez to get started can you just give us a little bit about your background into data science and into the role that you’re in now?

Gulrez Khan: Yeah. So when I give the intro, usually people talk about the professional stuff, but I think it’s more of we are more than our title.

So I start my intros like, I’m father of three beautiful kids, and I love spending time with them. I’ve been very fortunate to be blessed with lots of good people that I’ve met in my life who impacted in some way or form so that I’m in this place right now. For my job I work as a data science leader at PayPal.

I’ve been with PayPal for around four years now. I work in the developer productivity group at PayPal right now. And I also founded this data science community at PayPal. So we are doing several talks with all these AI explosion going on, like how do we keep up with that and learn from each other?

So that’s something that I’m doing as well. And you mentioned about the book. So I’m very pumped up with the book that I wrote recently, trying data with kids. And yeah, so that’s about me.

Evan Wimpey: Oh, very cool. Yeah, I’m fortunate enough, I work for a consultancy that is all data and analytics. So everybody I work with here is data science, data analytics, but a lot of folks listening on the call probably don’t have many close data science peers or they’re spread across the company or they don’t get to work together so often.

So it’s really interesting the sort of group or organization that you started or joined up at PayPal. Can you talk about maybe the impetus for that and sort of what your goals are?

Gulrez Khan: Yeah. So data science is still new. Like when I talk about different disciplines, when I started my career, data science was not a discipline.

So I was working with data, still doing the same sort of thing, like doing some kind of analysis, natural language processing, but my title was not that. Now data science when it came, and my manager, she said, what you’re doing looks similar to this new discipline. Do you want to have this new title? And I said, scientist, that sounds cool. So maybe let’s go with that.

So that’s how my discipline changed to data scientist. And now having worked in various teams in Microsoft and PayPal, what I’ve seen is not everyone understands data science, right? So, like you mentioned about the audience and other people, even there is a big gap in the business world and the data science community, right?

So, the business team, they don’t know how to leverage the data scientists. And the data science team, like, they are not in those, like, they lack the domain knowledge like most of us, right? So my strategy has been to create some sparks, right? So when I go—when I join a team—I try to identify some kind of a low hanging fruit, right?

So often with my background in national language processing, it comes in the form of customer feedback or some kind of a text, right? So any text that you can come up, like you would say, okay, what’s going on there? And customer feedback, like that’s an amazing thing. You go to any company, you grab the customer feedback, run the same scripts that you’ve been running for the last five years, and voila, right?

So you will see that, there are results and people listening. And when you create those parts, right, so when you have done some kind of, let’s say, part of speech tagging or things like that with topic modeling or whatnot, you show that to the business. And I always—like whenever I show that to the business—I tell them, think of this as a spark. This is not the final solution. What I want you to look into it is from your domain knowledge, what are the places where we can apply this technique and come up with those outcomes? So that’s been my strategy and that that works well, right?

So even like let’s say if I don’t have any data for the specific team that I’m working for, I’ll go and pull some public data set, right? So in the conference where we were together in the ML Week in Vegas, right? So one of the demo, which I gave was pulling data from the seattle.gov website for the parking violation tickets, right? So now like this is the public data set, and I go and then there is a human. I’m talking about how my wife was giving me a hard time that, hey, you get so many parking violation tickets.

And I’ll pull that, and I’ll see the story. Now change that. We use this same thing, and a different data set. This time the data set would be for the place where you are working, right? And what are the different places where, instead of parking violation tickets, this is where your users are having a bad experience. Those would be big red dots. And then you can analyze. So kind of creating some kind of sparks is what I go for.

Evan Wimpey: Yeah, I love that. I’m jotting this down. Think of it like a spark. And even using that term with sort of your business stakeholders. And I think some of the folks probably listening to this, you know, you Your background core is at PayPal and Microsoft. We hear those terms—we hear those teams as organizations and think that those are the tech savvy companies.

Those are the digital natives. So surely data science is just flowing through their veins, and everybody is on board and ready to implement. Whereas, you know, maybe the logistics company, the supply company, the retailer, maybe it’s harder—or at least the perception is that it’s harder to generate that spark or to get folks excited.

So I think it really is encouraging maybe to hear even a place like PayPal you’re being very deliberate about making this spark. It’s not—the spark isn’t always there and just ready to consume whatever data science you’re doing.

Gulrez Khan: Yes. So we often talk about the maturity of the organization, right?

So when we say, okay, is this company—does this company have a good maturity model? Are they good in terms of getting the data and then working with the data science team? Not all organizations are same, right? Most of the teams which I have been part of, I’ve been writing my own job description.

So, what it means is like the business leaders, they know there is value out of it, or maybe they hired me for a different reason, right? So here in this particular org where I’m working right now, they hired me thinking that they’ll do a lot of data engineering work and you have got some data background. So reorg. happened. So I said, okay, all right, reorg. happened. It happens all the time. And you look into, okay, what is the strategy of the business? Like, what are those—what is it that they want to do? And at the same time, I’ll have a pet project, which I will bring in. And I say, okay, you know what? We can do these kind of things with the data set that we have.

And again, spark, right? So now what happened after that is, they tabled down the thing that they were thinking to do with me and the team. Now this becomes your main project. Because they don’t know what they don’t know.

Evan Wimpey: Sure.

Gulrez Khan: So that’s where you bring in. And again, like storytelling is very important, like often as a data scientist, we often focus on creating those models.

And then, okay, you get the request from the business stakeholders, get me data from this data set, right? You can write your select query, send the data to there, but you can change the narrative as you go up the ladder in the, in the data science team or any team, right? So you try to change the narrative, right?

So you become a change agent. Not just do what has been asked to you, but understand the business and say that, Hey, you know what, there is another way we can do it, and then maybe we should change the strategy from this to that. If you’re able to do that, that means like, you’re doing a good job.

Evan Wimpey: Yeah, that’s a very powerful thing. Yeah. It’s that, that, that’s a much more. More attractive power than just whatever, how many gigs of RAM you can throw at some deep learning model. The term that you used there, data storytelling, I feel like has become more popular, but maybe is sort of still in that sort of nice idea that’s not well fleshed out by a lot of folks.

And, and Gulrez, as you mentioned also, the Machine Learning Week conference earlier this year where we met, and I saw you talk about data storytelling through stories. Can you give maybe just sort of an approach for somebody who’s interested in that—said yes, this is too bland, the way I talk about this? What are good principles? How should folks be thinking about data storytelling?

Gulrez Khan: Yeah, so I’ve got a few principles that I apply in terms of data storytelling. And often like when we do this kind of analysis or this kind of work, we say, okay, you’ve got different steps to a data science project, right?

So you start with the problem formulation. What is the question being asked, right? So that’s the number one thing. Then the second thing is getting access to the data. The third thing is data exploration. Fourth is, if you’re creating the models, that’s where you would do that. And the fifth one is the communication, right?

So when you—when we think of the time that we spend on these different pillars, you would see that the last pillar doesn’t get enough love, right? Just five minutes before the presentation, we will dump everything that we have done into our deck. And that’s where we fail. We say, okay, I’m not a marketing person.

My job was to create these beautiful structures. We got all this data. I’ve done this fancy LLM model and then recurrent neural network. I’m happy about that. Now you leave it to the business stakeholders. Business stakeholders—they don’t understand the models. They will not appreciate if you increase the accuracy of the model by 0.2 or 0.5. What does that So that’s where, like, you have to wear their shoes. And often I start with stories, right? The stories like, to gather the attention of the audience. I’ve got three kids, and every time I talk about a story, right? So the routine that I have is in the evening, I go for my evening prayer, prayer to the mosque, I come back and my son is sleeping. He is on the bed, and he’s there.

Abu, tell me a story. Tell me a story of your childhood. And if I don’t tell it, then he’ll not sleep. So I spend five minutes with him telling some story, and it works with the adults as well. So, if you see, like, the number of meetings that we have been, right? So, and we keep talking about, okay, what is the p value? What was those experimentation results? And this person has already attended—so many meetings, so many jargons in their head. Now you are another person adding to that. So you go and, when you talk about, hey, you know what happened to my cat last night? When you start with that, then they’ll start showing, paying attention because this is different. They can relate to that in their daily life. So I start with some kind of story to gather the attention of the audience.

Then there are principles that I use, which would be, for example, if you look into these tools that we have right now, you’ve got Tableau and then Power BI, Qlik. It’s very easy to drag, drop, and create a visualization.

Some people say that we have this term called data vomit, right? So you have so much data with different colors and everything you put there.

But again, like when you put that in front of your audience, like there’s so much cognitive load on them, right? And then they have got this magical device called cell phone, right? So much distraction. I know, like I was giving such a good presentation in the conference, and you were looking at your phone.

So it’s hard to gather the attention of the audience, right? So that’s where we need more of those stories and then use colors, use text animation. And when you do combine all of that like that’s where like it becomes a powerful story. But again, that’s just the start. Don’t think that you are done. That’s the start. And you have planted the seed. Now people understand the pain of the users and how you are communicating. Now do that again and again and again in different forums. Like in the creative space, we often talk about reusing our content, right? So you created a story, you gave a presentation—write a blog about it. And then maybe create a video. And then, do this PR stuff not only on LinkedIn, do it inside your company and see what happens.

Evan Wimpey: Yeah, I, I think that’s great that the principles really stand. And I think we’ve all probably can think back to memories of a presentation of, of data vomit and how, if it would have been wrapped in a nice story, how much more appealing it would have been.

Gulrez, as a data science lead—you know, we see job descriptive descriptions, job postings. I don’t remember seeing data storytelling on a job description. Maybe it’s out there. Maybe there’s some jobs that, that have that now, but is that something that you, you can try to hire for? Is that something that you try to identify in candidates as you build your team out?

Is it, you know, I need three years of Python and I need, you know, two years of data storytelling?

Gulrez Khan: Yeah. So it would be like two years of storytelling or something, but, uh, When you interview someone, right? So that’s where you would, uh, see where this person would fit in, right? So, to give an example, when I was in Microsoft and we were interviewing this lady for a data science position.

So our hiring manager gave us instruction that, hey, you focus on stats, you focus on the programming skills, and you focus on something else. And while we were interviewing, there was one characteristic that came out, which everyone observed. That was like her presentation skills, like how cool she was, when she was talking about, her prior project?

And what you do is, as an interviewer, you think, okay, we do this monthly project review with the VP. And when our data scientists who were there. They stumble, they are afraid, they are, it’s very hard for them to communicate. And if this person comes in, she will shine. Because she has that personality.

She didn’t do a good job in writing a query or something, maybe not as much as we would have liked, but then we were able to fit her in, in our imagination, in that conference room, and she was hired. So that’s the power of communications, storytelling, and then you also see outside of the now. What we see is you have people who have created a brand outside their work.

So often I get like, even at PayPal, when I was hired, like, it was like people knew me before they hired me, right. And that’s where like the, the interviews become different. Now you are interviewing the team members rather than the other way around. So I think being social and talking about things that can be very useful.

Evan Wimpey: Awesome. Yeah, I think that’s great, and I think it’s great to hear. Coming from you and your perspective, it’s maybe easy for me to think like that at a consulting firm. We explicitly have clients that we need to be able to deliver results to. But it’s no different for an internal data science team who has some stakeholder.

You’re trying to influence some decision. You have to be able to communicate that in in in a positive way. Now Gulrez, you mentioned some of your work outside of PayPal. I do want to be able to talk about your book. So maybe can you take a minute and just tell us sort of the motivation behind the book and sort of who the book is for?

Gulrez Khan: Yeah, so someone said that if you are looking for a book and you don’t find it, write that book. So that’s where I started. And I think there is a story for that. So when COVID happened, our work life and our home life, it’s kind of merged together. Yeah. Right now my kids are out for their swimming class, so you’re not hearing them. But otherwise they keep running around.

And one fine day, I was creating this tree graph or something. And that was, again, like, me playing with Open Dataset. That was the Reddit data which I was playing around. And the tree graph is something like this, that, hey, there is a node over here, and then you will have the questions, like what, where, how, when.

And then you’ve got another node, which comes out, which will have different questions. What is Reddit? What is stable coin and what all, what is data visualization? All those different things. So you see like the tree graph. I was creating that, and he comes on the backside and then like, watching my screen and he says, Abu, Abu is like dad. Abu, this looks like a mosque.

And I was I was just focusing, I was thinking maybe he has a LEGO thing in his hand, and he has created something which he’s trying to show. And I looked behind and I said, what? And he pointed towards my screen, and I said, mosque, how is it a mosque? And then I tilted my head and then I saw that, hey, yes, it does look like a mosque.

And that was a refreshing thing because for us, like as a data people or the tech people, we are used to these charts and other things, but like this was a four year old looking into it and with a different imagination. And I really enjoyed what he was showing. And that’s where I used to do these exercises like drawing with my kids.

So this will be like, we just sit together and draw anything, right? So my kids, they will bring colors and then they like me spending time with them. And I see like, oh, there’s so much value which I get. So they bring paper, and I’m not teaching them. I’m just having a good time. I love drawing and then I’m doing certain things.

And so I changed that. I said, okay, what if we draw some data visualization? And that’s where like, I started doing those things. And I started sharing those things on my office Slack channel. And on LinkedIn, and then a lot of people were interested, and a few people reached out, do you want—can you teach us, our kids?

I did a few workshops, and then I synthesized that into a book. And that’s how “Drawing Data with Kids” is out there.

Evan Wimpey: Awesome. Awesome. Yes, I, we’ve mentioned we’ve met together at the conference. I’ve got young kids as well. My oldest is eight. Three chapters into the book so far, but it has sparked some really fun conversations here.

And it has actually sparked me, working at home as well, he’s out on summer break to call him in and, and show some. Not non-proprietary data, don’t worry clients, but the visualizations, you know, we practice some from the book and then I show, look, this is, this is how is it Daddy’s work. Which it’s just been great and I, you know, would certainly advocate for folks of elementary, middle school age kids.

You know, the book is great. I also, I’m thinking back to—there’s a book called, “Bayesian Probability [Statistics] for Babies.” And there’s a huge—I think there’s probably a lot of something for babies books, but this is the one that I have found super useful. And if it weren’t for the title sounding so demeaning, like I want to get this for clients when we like teach an intro to Bayesian statistics, I think it’s so good.

I found just the first few chapters of your book seem like it is drawing data with kids, but is there, is this a useful book for an adult who wants to learn more about data visualization or be able to interpret data visualizations better?

Gulrez Khan: Yeah, so thank you for your kind words. What I’ve heard from my audience is like, they said—and this is like parents, right? So when they are teaching it to their kids, like some of them reached out and they said, hey, you know what? I know you have created this for the kids, but you are helping even the parents learn.

So you have, there is a big audience out there who get intimidated when they see some charts and some data. And in this world of AI, when everyone is talking about AI, they feel left out. And they are saying that, hey, we are learning with that as well. So that’s motivating. And I know like in the conference when we were doing the book signing, someone reached out and they said like, this is like a data for dummies, which is written in the form for kids.

So this is encouraging. I know like there are a lot of people who need data literacy, and it’s good that I’m able to play some part in that.

Evan Wimpey: Yeah, that’s fantastic. You can pretend you have kids, even if you don’t, and just boost up your, your data literacy here. Fantastic. Gulrez, I do want to ask you one more question.

You’re the data science lead at PayPal. PayPal does a whole lot in payment processing, a lot of things that I probably don’t even know about. You’ve been there for, I think, about four years now. This is a question I like to ask everybody. If you can just exercise all of the creativity that you want, you’ve got all the data science and engineering resources handed to you, and Gulrez, it’s your show.

Whatever you want to work on, you’ve got full buy-in from everybody who needs to. What kind of problem would you want to tackle? Where would you point your efforts?

Gulrez Khan: So you mean at PayPal?

Evan Wimpey: At PayPal, yeah, or otherwise. You say, PayPal, thanks. We’re going to point our efforts somewhere else.

Gulrez Khan: Yeah, so I, I think there is a lot of value out there in the customer data, right? So the feedback, right? So you look into the feedback, and there are a lot of underserved populations, right? And that feedback data doesn’t have to come within the system, but even from outside, right? So let’s say, like, you go and you look into the social media data—Twitter and Facebook, what not, right?

That’s where, like, you can understand, like, what’s the population, who are impacted, who are not getting loans, right? So what are their sentiments? How can you help them? So, ideally, if I have the resources, I would go into this data for a good thing—like that’s where my heart is and see like how we can help these underserved populations, and see how we can be of benefit to them because data is very powerful and with a good heart you can make changes.

Evan Wimpey: Wow. That’s great. That’s a very powerful answer. Thank you so much for that. And thank you so much for joining us on the show today. We’ll make sure to link to your book in the show notes. Wherever you’re watching this, you can click down and see Gulrez’s book there. Our guest today, Gulrez Khan, data science lead at PayPal and author of “Drawing Data with Kids.”

Gulrez, thanks for coming on the show.

Gulrez Khan: Thank you.