Technical debt is the ongoing cost of expedient decisions made when implementing code. It is all the shortcuts or workarounds in technical decisions that give short-term benefit in earlier software releases and faster time-to-market. The phrase was coined by Ward Cunningham in 1992 in an effort to explain to non-techncial product stakeholders the need for what we call 'refactoring'.
Artificial Intelligence (AI) and Machine learning (ML) systems have a special ability to increase technical debt. Specifically, it has all the problems of regular code, plus ML specific issues at the system level. This article discuss three of the technical debts that you may encounter in your journey to production.
1. Hidden Feedback Loops
In the grand scheme of things, AI systems are usually part of a bigger data machine. This means that its input and output are reliant and dependent on other components within the big machine (see Fig. 1). As a result of its surrounding environment, the real world system often end up influencing their own training data.
This may happen in surprising ways, for example - you run a video streaming platform and your backend team has come up with a recommendation engine to recommend viewers new channels based on their past viewing history and profile. The engine also included those channels that viewers have downvoted or ignored. You then launch this feature and celebrate as the recommended videos get growing number of clicks week-on-week. Things look good but too good to be true.
What you missed out is that the front-end team implemented a fix to hide the recommended videos with low confidence (e.g. less than 50%) because potentially bad videos should not be recommended. As time goes by, the recommendations previously with 50-60% confidence would now be inferred with the <50%. This forms a strong feedback loop that fuels your metric (e.g. viewership) and this is a danger. You have now unknowingly fallen into a trap of always recommending users the same kind of content, eroding the ability for your system to recommend new content users might be interested in.
Eventually, this turns into a problem where your metric grows, but not the quality of the system. Worse, you may not know that this is happening. Finding and fixing the loop is a much harder problem and would require cross-team effort.
Step back and study the problem you are trying to solve in a more holistically manner. Avoid having a tunnel vision and staying in the comfort zone of an AI engineer/scientist where you only want to improve your own metric.
Moral: You should not only exploit AI but also allow it to explore. Also, periodically review your metrics and rethink how you measure success - look beyond the numbers.
2. Pipeline Jungles
AI systems are typically made up of a series of workflow pipelines. These pipelines are responsible for a complicated series of jobs that run in special sequences. They may be built by different people with different levels of visibility to the bigger picture and this is where disaster happens.
Oftentimes, due to time pressure, a pipeline is scrappily put up just to fulfill a certain requirement. It can consist of different glue codes written in different languages and special languages for managing the pipelines. They can go from data preparation, scraping, cleaning, joining, verifying, extracting features, splitting data in train/test sets, checkpointing, monitoring performance and pushing to production, etc. This means that it is way easier to end up with a spaghetti system and get entangled in the mess!
Within the pipeline jungles, you may also be surprised to find undeclared consumers. Without access control, some of these consumers can be downstream systems that silently use the output of your model as an input to another system. Undeclared consumers are expensive and dangerous, especially when it is tightly coupled to the output of your AI system.
Pipeline jungles can be avoided by looking at systems at a bigger picture and taking tasks like data collection and feature extraction seriously. One approach to tackle this problem is to put in some investment to clean up the pipeline to dramatically reduce the drag moving forward. For complex system, it might be worth breaking this effort into smaller parts and have an external consultancy see it through since it can take months or years to refactor.
Moral: It takes extra effort to keep all components of the subsystem aligned. Pipeline jungles tend to snowball into a bigger problem, so do tidy up before it is too late. Oh, did I mention about managing dependencies?
3. Data Dependency
Dependencies in a typical software engineering project can refer to all the modules, packages and libraries you import to help build certain functionalities without reinventing the wheels. In an AI system, there is also data dependency and it carries similar capacity for building debt. Worse, it may be more difficult to detect.
Inconsistency in Data Dependencies
Inconsistency in data quality could cause the input signals to be unstable as the data could qualitatively or quantitatively change behaviour over time. These input data may be generated from another ML system and can be updated over time to produce embedding or semantic mappings for another system (e.g. image embedding, TF/IDF). A silent update to the upstream system will break the downstream prediction system.
Unused Features in Data Dependencies
Over time, old features are made redundant by new features and this shift goes undetected. For example legacy, correlated or bundled features that are put together hastily get stuck in the model and stay there forever. Eventually, these features contribute to (multi)collinearity and weakens the statistical power of your model.
Use a frozen copy of your embedding/semantic mapping and allow the original copy to change over time. Only deploy the new mapping after vetting and verifying. Also, unused or under-utilised features can be detected via exhaustive leave-one-feature-out evaluation and run this check regularly to remove these unnecessary features.
Moral: Code dependencies can be identified via static analysis using compliers and links. Identifying data dependencies requires examining how data translates into signal, removing noise and extracting clear signals from the data.
Conclusion - Paying off Technical Debt
Moving quickly and breaking things might be the new development mantra in this fast moving world. However, technical debt can be kept to minimal by asking a few useful questions during development. We should ask ourselves:
- Do we know precisely how a new model or update model would affect the entire system? Where and how can we measure the impact of this change?
- What does our metric of success look like? Does it gel with the business objective?
- Are we keeping track of all the dependencies, producers and consumers of the entire system? Do we have a systematic way of identifying faults when we need to?
Note: This article also appeared in https://towardsdatascience.com/3-common-technical-debts-in-machine-learning-and-how-to-avoid-them-17f1d7e8a428. Follow me on twitter at DerekChia for more updates!
- Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
- What’s your ML test score? A rubric for ML production systems https://ai.google/research/pubs/pub45742
- Machine Learning: The High Interest Credit Card of Technical Debt https://ai.google/research/pubs/pub43146