Code migration is a critical process when it comes to maintaining software applications, as it helps improve performance, enhance resilience, keep systems up to date, and eliminate stale or irrelevant code. However, the process can be exceedingly complex and time-consuming because the code is often distributed across a multitude of environments. While artificial intelligence (AI) has already begun assisting with various lower-level programming tasks, it has struggled to handle the convoluted task of code migration effectively.
However, Google has made strides in overcoming this challenge by employing a new step-by-step process and a common toolkit wherein large language models (LLMs) identify the files that need changes. According to Google, this innovative process has accelerated code migrations by 50%, setting a new standard in the industry. In a recent experience report, a team from Google Core and Google Ads described their approach, noting that it has the potential to revolutionize how code is maintained in large enterprises. Here, we explore the intricate steps Google has taken and highlight a few practical use cases.
1. Locate Code Spots Where Flags (Experiments) Are Mentioned
Google’s primary objective was to identify opportunities for LLMs to deliver added value and support scalability without relying on difficult-to-maintain abstract syntax trees (AST). Traditionally, ASTs have been used to represent the structure of a program or a code snippet, but they are deterministic, meaning that outcomes are predefined. Code migration scenarios often involve complex constructs that ASTs find challenging to represent. Google’s teams noted that success in LLM-based code migration isn’t straightforward. Utilizing LLMs alone through simple prompting isn’t sufficient for complex migrations. Instead, a combination of AST-based techniques, heuristics, and LLMs is essential to achieve success and ensure the changes are rolled out safely to avoid costly regressions.
Success for Google was measured by achieving at least a 50% reduction in the time required for end-to-end work, including code rewrites, identifying migration locations, conducting reviews, and performing the final rollout. In the end, engineers reported that this milestone was indeed achieved, with 80% of code modifications being fully AI-authored. Anecdotal evidence from developers indicated that even if changes weren’t perfect, significant value was found in having an initial version of the changelist already created. This initial effort often paved the way for further refinements and optimizations.
2. Remove Code Mentions of the Flag
One of the largest business units within Google, Google Ads, operates on a code base consisting of over 500 million lines of code. The system employs dozens of numerical unique ID types that refer to various resources, such as users, merchants, and campaigns. These IDs are usually defined as 32-bit integers in C++ and Java but had to be converted to 64-bit IDs to prevent ID value overflows. The report noted that moving from 32-bit to 64-bit was fraught with difficulties. Within Google’s ecosystem, IDs are sparsely defined and hard to locate, making them difficult to search and identify through static tools. Compounding the challenge is the fact that Google Ads features tens of thousands of code locations, rendering manual tracking overly complicated.
In this scenario, Google’s LLM-powered code migration process was a game-changer. Initially, an engineer identified the necessary IDs, file supersets, and locations for migration. The required changes were then generated within the LLM, fostering a feedback loop of testing and iteration. This iterative process allowed the engineer to review LLM-generated code as they would with any other codebase, making changes and corrections as necessary. Once this step was complete, the changes were split and sent for final review by the proprietors of each code segment, ensuring that migrations were carried out efficiently and accurately.
3. Streamline Any Conditional Statements That Rely on the Flag
Another pertinent example involves a significant set of test files still using the now-outdated JUnit3 library, a unit testing open-source framework for Java. Manually updating these files posed a considerable challenge and could have negatively impacted the codebase by introducing technical debt. Technical debt tends to replicate itself, as developers might inadvertently copy outdated code to produce new code. To tackle this issue, Google’s developers used LLMs to update a critical mass of JUnit3 tests to the new JUnit4 library. This automated update enabled the smooth migration of 5,359 files, modifying more than 149,000 lines of code over three months. This effort exemplifies Google’s efficient approach to transitioning to up-to-date technologies, crucial for maintaining the health and performance of their vast codebase.
4. Eliminate Any Redundant Code
In another use case, Google faced the challenge of cleaning up experimental code that had become stale. Obsolete experimental code can lead to inefficiencies and maintenance headaches. Using AI, Google performed several crucial steps to clean up such code. Initially, they located areas in the code where flags or experiments were mentioned, subsequently removing any code references to the flag. Next, they simplified any conditional expressions that depended on the flag and eliminated any redundant or dead code. This meticulous cleanup also involved updating existing tests while discarding any unnecessary or obsolete tests. This comprehensive approach ensured that the codebase remained clean, efficient, and scalable, significantly reducing the time and resources required for manual cleanup.
5. Revise Tests and Discard Unnecessary Tests
Code migration is essential for maintaining software applications, as it boosts performance, resilience, keeps systems current, and removes outdated code. However, it can be highly complex and time-consuming since code is often scattered across numerous environments. While artificial intelligence (AI) has begun to assist with various basic programming tasks, it has faced challenges in effectively managing the intricate process of code migration.
Google has made significant progress in addressing this issue by implementing a new step-by-step process and a standardized toolkit where large language models (LLMs) pinpoint the necessary file changes. Google reports that this innovative approach has sped up code migrations by 50%, setting a new benchmark in the industry. In a recent experience report, a team from Google Core and Google Ads detailed their method, suggesting it could transform code maintenance in large enterprises. Here, we delve into the detailed steps Google has undertaken and showcase a few practical applications of their pioneering method.