Revolutionizing Data Science Workflow

A Drag-and-Drop Machine Learning Pipeline Builder

Kepler’s pipeline builder was an internal tool developed to aid our data science team. It enabled the rapid delivery of client projects through a node-based drag and drop GUI. This tool simplified the creation, training, and deployment of machine learning models.

My Role

Lead Product Designer & Motion Designer

Problem

Before becoming a product company, Stradigi AI was an agency focused on client work. There was a department dedicated strictly to AI-related projects. They noticed that a lot of code was discarded and much of the remaining code overlapped in multiple client projects.

Impact

The idea emerged that we could productize our codebase to facilitate quicker turnaround times for client projects. As a result, we were able to trim down projects that used to take months into just a few weeks.

Research and Discovery

We began by interviewing our data science team to understand their process, the tools they use, and some of the challenges they face when handling client projects.

We noticed a need for better organization in their experimental process:

  • Observation: A significant portion of their process involved experimentation.
  • Tool Used: They used a tool called Jupiter notebook, which involved opening dozens of tabs, each representing a different experiment.
  • Issue: It was evident that this disorganized approach needed to be resolved.
  • Aim: We aimed to design a solution that would facilitate easy iterations and include a clear labeling system to help users distinguish one experiment from the next.

Design Process

In close collaboration with both the data science and development teams, we researched open-source tools that we could leverage and customize to develop this new tool. We chose a node-based framework that would facilitate the machine learning model creation process.

The following points outline the process of how we developed our product:

  • Idea conception: We conceived the idea of housing snippets of code in reusable nodes, which we later named “blocks”. Users could connect these blocks together, simulating the code-writing process that data scientists use in their notebooks.
  • Product development: We began to shape this product through various wireframes, prototypes, and small proof-of-concept projects.
  • Usability tests and feedback: We conducted usability tests and gathered feedback from our target users.
  • Iterative improvement: We iteratively improved the product, keeping the data scientists – our end users – heavily involved in the process.

Challenges

Importing and cleaning datasets

One of the first and crucial steps of running an experiment is importing your data. However, not all received data is clean and structured, and it can come from various sources.

Here are the key points about the feature we developed:

  • Visualization: We built a feature that would allow users to visualize their data.
  • Tools: We provided tools to clean and structure the data.
  • Table View: Through a table view, we offered users an easy way to view their data.
  • Data Cleaning: We utilized various techniques to clean their data, facilitating a more accurate model.

Block organization

Here are the key points about how we improved our users’ experience:

  • Users enjoyed connecting blocks: Users enjoyed the process of connecting these blocks together, similar to Lego.
  • Debugging frustrations: However, users experienced frustration when trying to debug issues related to improperly connected blocks.
  • Idea of implementing logic: This problem led to the idea of implementing logic within the blocks.
  • Added visual cues: We incorporated visual cues to indicate which blocks could be connected together.
  • Improved user experience: This resulted in an enhanced user experience, enabling users to speed up their workflow.

The Solution

After six months of work, we achieved a somewhat polished version of this internal tool. We understand that all internal tools have limitations in terms of budget and headcount. Nevertheless, we are very proud of this pipeline builder tool.

Our process for handling customer problems includes several key steps:

  • Swift examination of our codebase: We can quickly look into our codebase when a customer presents a problem.
  • Assessment of previous similar problems: We check if we’ve encountered similar problems before.
  • Fast delivery of client projects: We can deliver client projects at an exceptionally fast rate.
  • Incorporation of new code for unique problems: If we encounter a new problem that requires unique development, we can incorporate the new code into our pipeline builder, enhancing our tool with each iteration.

In addition to the pipeline builder, we also developed a dashboard to visualize the results of experiments and the performance of our models.

Results
  • Reduce client project timelines from months to weeks.
  • We successfully imported an open-source code library to decrease the amount of custom code required.

Conclusion

Kepler’s Pipeline Builder revolutionized Stradigi AI’s workflow, enabling rapid delivery of projects by streamlining machine learning model creation, training, and deployment. It addressed code overlap in projects, facilitated an organized approach to experimentation, and became efficient through user feedback. By leveraging open-source code libraries, it reduced project timelines and custom code requirements, demonstrating the impact of user-centered development in data science.