“Can Agile be used for data science research projects?” An architect at a local research-heavy technology company posted that question in our community Slack channel. My immediate reaction was “Of course!”, but I didn’t post an answer. While I have led R&D teams, the work we did was more development than research. I was only halfway through a Master of Data Science program, so I couldn’t speak from experience with Data Science projects either. One lecture touched on Agile, but it was from a software development perspective. The final project at the end of our program was the perfect opportunity to prove to myself that it could be done.
Applying Agile to a “Short” Academic/Government Project
For our final project, we had nine weeks to visualize and analyze inconsistent, infrequent GPS data. In some contexts, nine weeks may be considered a long time. It’s relatively short for data science projects. Our project sponsor and client was the local city government, in addition to the university grading us for our degrees. Our team was three people with very different backgrounds: academic, petroleum engineering, and video game development. I was the only one with Agile experience. Due to personal circumstances, we weren’t colocated. For the last half of the project, we were in three different cities, one being eight time zones away. These challenges reflect the reality that many research projects face in industry and academia.
Planning the Unknown
At the start of the project, our team had no experience with the domain. Other than simple visualizations of coordinate data, we had no experience with GPS data, mapping software, traffic analysis or urban planning. We needed to figure out how to tackle the project, what packages/algorithms to use, and how to apply them to a large amount of data. We also had no idea what we were going to discover along the way. Yet, we needed to produce a plan for the project proposal. Fortunately, we had a fixed deadline. So, I created a high-level strategy with goals for every sprint:

High-level strategy and schedule in project proposal
This gave us a roadmap of what goals we were trying to achieve in order of priority. It was also flexible enough for us to make changes as we made discoveries.
During the project proposal sprint, we did our literature search and preliminary prototyping. As a result, we had some ideas on how to tackle the project and experiments we needed to do. Our “Count” sprint was finishing the prototyping. Our initial approaches failed, but our goal remained the same. If we weren’t able to achieve the goal for “Count”, we couldn’t complete the project. As we moved into the third sprint, we decided to “Scale” first and added “Visualize”, which turned out to be an important step for determining the right model. The “Model” sprint turned out to be more complex than expected and included the scaling factor.
Our high-level strategy provided guidance and we updated it as things changed. Without that strategy, it would have been easy to lose our way and pull the project in different directions.
Communication and Transparency
Since our team was not colocated, our daily video conference “stand-ups” were critical for staying in sync and adjusting to challenges and discoveries. We discussed where we were at, took on work that aligned with our individual strengths and interests, and paired up when needed. Throughout the day, we were in almost continuous communications over our team Slack channel. We met with our university supervisor weekly and provided written progress reports. Those reports also served as a weekly retrospective and check-in on our planning.
We met with our clients weekly to be completely transparent and engage them on the project progress. Although it wasn’t an explicit goal, we could actually show progress each week. We showed the prototypes with small samples of the data and our first visualization at scale. We showed each of the models we experimented with and had a discussion about pros/cons of each. They were the domain experts and would ultimately use what we delivered. We needed their collaboration.
The Final Analysis
At the end of the project, we achieved what we needed to achieve in the time we had. If we had more time, we would have spent it looking at the data to extract more insights. If this were an industry project, we would have added an extra sprint or two with a list of investigations.
With respect to applying Agile, we covered all the essentials using the minimum amount of process:
- Had a high-level strategy to guide our work and updated it as things changed
- Prioritized and planned our work each sprint
- Kept track of our work and what was left to be done
- Had a regular communication cadence with each other and our clients
- Regularly showed our work
The one thing we did not do: estimate. This was our first data science project. We did not have enough experience to estimate specific tasks. Plus, when you’re working in the unknown, it’s impossible to estimate effort. However, we had goals with time boxes. This provided enough structure to ensure we could meet our goals and give attention to anything that lagged behind.
How Others Apply Agile to Data Science Research
After completing the project and before writing this article, I compiled some articles on how others apply Agile to Data Science and other research projects.
Academic research:
- Agile in a chemical engineering lab: How agile project management can work for your research
- A paper on SCORE (Scrum for Research): Adapting Scrum to Managing a Research Group
- Application of Agile in genomic research at the Broad Institute at MIT and Harvard: Reinventing Research: Agile in the Academic Laboratory
Research in Industry:
- How 3M Health Informations Systems research team uses Agile:
- R&D at HP: How I led 6 R&D groups through an agile transition
Data Science in Industry:
- Applying Agile IT Methodology to Data Science Projects
- Data-science? Agile? Cycles? My method for managing data-science projects in the Hi-tech industry
- Data Science and Agile: What works, and what doesn’t (Part 1)
- Data Science and Agile: What works, and what doesn’t (Part 2)
Related Posts
July 31, 2019
Agile, Data Science, Project Management