Success Factors for Data Science Projects
In more than 10 years of data science consulting, we have experienced the entire spectrum: from proof of concept that finally ends up in the drawer forever to long-running projects that are continuously developed even after years and are used intensively by our customers. In the following, we would like to get to the bottom of the question of what the prerequisites are for the success of a data science project.
There is no universal definition, but for us it means:
- The project goal - usually defined in the offer - has been achieved (e.g. development and deployment of a model forecasting target variable X)
- Our customer is satisfied with the result
- We are satisfied with the quality of our work
In the following, I present the most important factors that contribute to the success of a data science project.
Projects can have different levels of priority. In the case of high-priority projects, the exchange is often intensive and our customers are available promptly for questions and agreements. Such projects are often intrinsically motivated because they are of central importance for the organization and the business model. If a project is not given such high priority, customers often have fewer resources for a close exchange. This is often the case when the motivation for the project is more of an extrinsic nature, for example because something has to be proven or complied with. The contact persons on the customer side may find the project exciting, but they have to deal primarily with other topics in day-to-day business. Of course, a project does not necessarily have to be priority no. 1 in order to be successfully completed. Above all, it is important that the expectations of the availability and the intensity of the exchange are clear and compatible on both sides. If there is a sporadic exchange, this must be taken into account in the project planning in order to ensure the success of the project.
A good use case is the absolutely necessary prerequisite for a successful data science project. It should therefore be clear from the outset what the results of the project will later be used for. And not just approximately, but exactly. A good use case is essentially characterized by four properties:
A use case such as "a customer lifetime value (CLV) for use in online marketing" is not very useful, because online marketing is a huge area in which a wide variety of things can be done. Much more helpful is the following use case: "a customer lifetime value to identify the most valuable customers (approx. 1%) and secure their loyalty with coupon promotions." Here it is clear what exactly is to be done with the calculated CLV values.
Often several stakeholders have an interest in the results of a project, for example different departments. Sometimes, however, it turns out that the stakeholders do not need the same thing in detail. In the CLV example above, department A wants to increase the loyalty of particularly valuable customers with the help of vouchers. Department B, on the other hand, may want to exclude unprofitable customers from voucher promotions. This means that for department A, the model must differentiate well in the highest segment, while for department B, the lowest segment is relevant. It becomes even more difficult when not even the target variables are identical, because departments A and B are interested in margin, but department C is interested in revenue, for example. To maximize the benefit for all stakeholders and avoid disappointment, it is important to precisely delineate interests and expectations without avoiding confrontations.
This is the prerequisite for the results of the project to be used productively in everyday work. Here it is particularly important to talk to the later users.
Possible challenges may include:
- The data is not available in the required timeliness.
- The approach is only applicable to a few special cases.
- The results still have to be prepared in a certain way so that they can be used (e.g. classification of a customer lifetime value into 5 categories).
- There is no frontend (e.g. a dashboard) for end users to access the model results.
- The actions that a user can derive from the results are not actionable.
- Key contacts do not have the capacity to collaborate on the project.
- There is a lack of support from individual stakeholders (management level or specialist department) for the subsequent use of the results.
- There is only one contact person, which later becomes a bottleneck.
- There is no one on the customer side who could take over the operations of the project later on a permanent basis.
As data science consultants, it is not only our job to provide the customer with a suitable model and ensure productive operation. It is also part of the job to inform customers about what they can expect from the project.
Typical contents of a good consultation are:
- Possibilities: What is and is not possible with data science?
- Prerequisites: What are the prerequisites for a data science project (e.g. required data)?
- Effect size: How big can the improvement (in revenue, efficiency, ...) realistically be?
- Time to effect: How quickly will the project benefits be noticeable?
- Development effort: What are the differences between "proof of concept", "MVP" and "product"? Why is the transition from proof of concept to production so much more costly than the proof of concept itself?
- Efforts for operations: What efforts can be expected for operations and maintenance of the data product after development is complete?
Good consulting contributes to the success of a data science project by creating a common knowledge base and realistic expectations for all parties involved.
A data science project should deliver measurable results within a foreseeable time horizon. To this end, the prerequisites for implementing the project must be met (see The use case is thought through to the end). Basic structures ("data maturity") must also already be in place. If these structures are not yet in place, there are three options:
- Consulting in the field of data strategy to lay the groundwork for more difficult data projects later on
- A narrowly defined lighthouse project in which the benefits of data science become visible within a short period of time
- Division of the originally planned project into individual sections, so that intermediate successes become visible and, even in the case of premature termination, useable partial results have emerged
In order to assess whether the leverage of a project is big enough, one has to put costs and benefits in relation to each other.
Important questions to ask when assessing the benefits are:
- How many people will use the results of the project?
- How much time will be saved for the users?
- How big is the cost saving / financial gain for the client?
- Is the result broad or only applicable to individual cases? (For example, can the customer lifetime value be calculated for the entire customer base or is it based on specific data available for only a few customers)?
Often, the use case must first be sufficiently concrete and thought through to the end in order to answer the question of leverage.
The key to identifying potential hurdles is communication. Once potential problems have been identified, the consultant and the customer can discuss together whether it still makes sense to implement the project and how to adapt the planned procedure so that they can look back on a successful project in the end.
For us, the most important factors for successful data projects are:
- The use case is concrete and thought through to the end and does not compete with other use cases.
- Expectations for the project are communicated and aligned.
- The project is designed to deliver measurable results in the foreseeable future and fits the customer's maturity level with respect to data projects.