How a Datathon Saved Christmas

Team Building and Santa Saving
Author:

Evan Wimpey

Date Published:
December 22, 2021

Good news everyone:
Christmas is back on the table this year!

Santa’s lead navigation elf made a lateral move into the jack-in-the-box department, and they’ve so far not found anyone to fill her place. Santa has a lot of children to visit, but how will that be done in time without his lead navigator? He’s got to be incredibly efficient and take the shortest route possible. Luckily for Santa, the Commercial team at Elder Research stepped in to help out during their latest Datathon.

Why a Datathon?

Continued learning is one of the most important facets of a data science shop. There are several places (e.g., statistics.com) where data scientists can brush up on skills from statistics to computer science, and there are an endless number of analytics topics to explore. How should analytic leaders employ resources to ensure their team stays up to date?

We at Elder Research like to be efficient, and recently staged a learning opportunity which was also a team building activity.  In fact, every month the Commercial team develops a new topic and data and invites technical staff to spend 1-2 hours working on a solution. Two hours isn’t much time for an individual to learn how a new algorithm works or develop best practices in a new domain, but it is enough time to scratch the surface and become aware of key issues. At the end of the month, all participants come together to share learnings, solutions to the problems, and award a (nominal) prize to the Datathon winner. It’s proven to be a fun and risk-free opportunity to try new techniques, facilitate sharing across teams, and to learn.

Datathon Topics

Perhaps the best part of a typical Datathon is developing the idea! Few rules exist, and creativity is key. Building a model to identify the MNIST digits, or to classify irises, or predicting who will survive the titanic are all well and good…but a Datathon should be fun and interesting! Some of the recent Elder Research Commercial Team Datathons include:

1.

Using the candy power rankings to explore interpretable techniques and discover what makes a good Halloween candy. Candy Power Rankings
step image

2.

Conduct some descriptive analysis on a small data set. Then, using the same code, ‘race’ on a very large data set of the same structure.
step image

3.

Model the prices of used cars but focus on having the best-calibrated confidence intervals.
step image

4.

Help Santa navigate to all the big cities in the world (population > 3 million) as efficiently as possible.
step image

Traveling Santa Problem

Santa’s navigation problem is of course a version of the traveling salesman problem. Due to the insanely large number of possible routes, it is infeasible to calculate the optimal path, but there are several strategies for finding a relatively good route. Elder Research data scientists certainly aren’t the first to tackle this problem in a Christmas context.

Here were the ground rules:

1.

Santa must visit each of the 268 cities (a strategic roadmap; Santa can plan for the individual households on his own)
step image

2.

Santa can start and end at any city
step image

3.

The ‘elf’ that makes the route with the lowest overall geodesic distance traveled will win Geodesic Distance
step image

4.

Bonus points awarded for a festive map that makes it easy for Santa to read
step image

Results

The elves approached the problem in several different ways, ranging from complex to naively simple. This is because some participants spent a few hours working on an optimal solution, while other participants didn’t have as much time to contribute or focused their time on making a festive map. Several entries employed Mixed Integer-Linear Programming using the CBC solver. Of course, the problem is too large to solve as an optimization problem directly while considering all possible routes.

One method was to optimize subsets of the cities, and then visually decide where to link one subset to the next. Another method was to solve the optimization while relaxing the “no sub-tours” constraint, and then progressively add constraints until all sub-tours are eliminated (similar to this post).

Another method employed the mlrose python package and genetic algorithms as the solver. Here’s the best route discovered by the elves, which used the CBC solver and progressively added constraints to eliminate sub-tours.

 

Some elves tried to implement solutions from scratch. One such elf surveyed greedy routes:  Start with a random city, send Santa to the next-closest city not already visited, and continue until all cities are visited. Try that with each candidate city as a starting point, and then use the best solution. While this resulted in a slightly longer trip overall, it ran in a fraction of the time that it takes to execute the optimizers.

Some elves took an even more naïve approach, and we thank them for establishing a baseline of sorts. If Santa were to average the latitude and longitude of each city, and then rank the cities by that average, he would have to travel over 3 times as far as the best solution. Here’s the rather comical path that he would take:

 

There were even some elves that decided to add additional factors to make the problem more interesting.

What if Santa has to slow down to take sharp angles?

What if he has to consider population density?

What if he can only deliver when its dark outside at the specific location?

This is the beauty of a Datathon: if there is something that seems interesting to a participant, this is the time to pursue it, learn about it, and share it with peers.  (In real-world problems, great value often comes from brainstorming beyond the presented constraints, but the team has to be careful to budget time appropriately.)

Here are the interesting results of the traveling Santa problem
scored by total distance traveled:

Conclusion

Good news all:  Analysis has saved Santa’s routing plan.  (And, the team at Elder Research got to take a little bit of time away from day-to-day work to explore a new problem.)  With no risk for a ‘failing’, there is every incentive to try creative, novel approaches. After sharing ideas, we now have more collective knowledge about approaches to the traveling salesman problem, Mixed Integer-Linear Programming, the PuLP and mlrose python packages, and animations using python. And, the fun bonding experience results in a more cohesive team and better trained data scientists.