Practical Data Science Projects

One of the biggest challenges around Data Science is how conceptual sometimes it gets. As it is in constant contact with statistics, it is very easy to fall into the theory and math behind models and skip over the practical applications. Add on top of it the fact that a lot of people outside (and many inside!) are not well-versed in the data area; and it becomes hard to come up with applications for all the models and approaches we learn.

The upside is, there are a few, core projects that almost every company needs in one way or another. They are usually not very complicated to build as a minimum viable product (MVP) but generate a disproportionate amount of value and insight. In this post, we will go over some of these projects and I will give very broad rules of thumb for each, so you can apply them as quickly as possible wherever you wish to use them!

Forecasting… Everything

No matter if we are a single person or a big conglomerate, we all want the same thing: to know what the future holds. Having an idea of where you will be in 1 year, or where your product will be in 5 years, is invaluable for us all.

Generally, when you do one forecast, you will have to do a lot more just because of how much hype it generates. It is a piece of information everyone can understand and connect to in their work. “We are going to generate £50 million in the next 12 months” is a sentence that the executives on the board of directors, finance teams and developers all would love to know.

One caveat of this very valuable tool is that there are many different things to predict, and all of them require a special approach in general. Predicting the revenue next month might just involve looking at what we generated this month, or maybe building a simple regression model. But forecasting how many players we will have next month involves a lot more variables – the patch dates, retention, past trends, or even the season of the year! Pro-tip; in most cases, it is impacted by the season to some extent. This makes forecasting a project area where you can get creative and build your skills with different models, which is a big advantage.

As there are many types of models, it is tricky to give specific tips on forecasting. However, the one tip I can give is that: all data is dirty and you need to be very diligent in your data cleaning. I can guarantee no matter which company or product you are working at, the charts and data you have are missing some days, have giant spikes in volume on others, and many other issues that may not be visible but one person working there for 12 years knows. If you do not check for these before the modelling, you will need to adjust your model results afterwards, and that is a big error that leads to confusion, frustration, and tears. My approach generally plotting every variable that goes into the model in daily, weekly, and monthly intervals and look for issues. Clean up those issues either by filling them or removing them, and then show this to the experienced people in the company. Make sure no data issues are missed. Enjoy the oracle treatment you will get after the forecasts.

User Segmentation

Every app has different types of users that use it. A gym tracking app will have people who download it and open it rarely, and others who log their exercises daily. Or a mobile game will have people who enjoy spending hours at their favourite level, while others play for 15 minutes to finish a few things here and there every day. This is actually a fascinating result of our product’s core loops – although it is the same thing they are using, they have vastly different ways of using it!

As every behaviour is different, there is a lot of potential upside to treating them differently as well. Warning; treating differently does not mean preferential treatment of one group over the other. For example, there is nothing more frustrating to a casual player than being beaten by someone just on the merit that the opponent purchased something from the in-game store. It just means leaning into what they like about the product and making their experiences better – the way they want.

The tricky point in building a user segmentation is deciding on the segments themselves. I am sure everybody in the company has some idea about the types of users they have, and if asked they can give rough rules to define a segment. “People who log in 5 days or more per week are our core audience”. While this can give you an advantage in where to look for behavioural differences, it is actually limiting what the models you build can come up with.

There are two approaches to defining segments. Hard-coding some rules and manually splitting up the users based on your experience and knowledge of the product. This is tedious and hard to maintain but can give quick and relatively accurate results. A better approach, in my view, is using machine learning models.

The best type of model to apply here is clustering. It discovers groups from the data provided, without having to rely on any prior knowledge. However, the input data becomes critical because of this. The best tip about collecting the data is: “think about the mechanics of the product itself”. A great group of metrics is the time spent on each feature. For example, you measure how many minutes the person spends playing vs. other players, or the computer and measure the time spent in menus. If the product is also well designed, it should be a clear example of which feature every person is most interested in. As the product evolves, and as new ideas are implemented this segmentation is going to change, so try to revisit the model to see if it is still applicable every 3-6 months or so.

Churn Prediction

All of the users will eventually stop using our product; this is just the natural way. Just like any relationship or commitment, there are indicators and behaviour changes along the way. Understanding what patterns a user exhibits soon before they churn is incredibly useful for the long-term retention and life span of our product. The following step is changing those churn moments or delaying them through product changes, or contacting the user at the right time with a great offer they can’t refuse.

Understanding the product is the main requirement of a good churn prediction. To figure out what makes someone stop using the app, for example, you need to understand what makes them use it in the first place. There are a lot of metrics that help with identifying these points too – and maybe the best one is usage frequency. If someone is gradually engaging with your product fewer times in a week, then they are likely about to churn completely soon.

This area is so crucial to understand for both analysts and companies, that there will be future posts investigating churn in more depth.

There you have it; three different projects using Data Science tools for you to go and apply! If you build one forecasting model, apply a solid user segmentation or get an accurate churn prediction you will have a solid working-level knowledge of an area that you may want to specialize in. These projects will give you experience, add value to your work, and make everybody eager to see the next big data project that you and your team will come up with.