How Is Google Using AI to Transform Code Migration Processes?

Code migration is a critical process when it comes to maintaining software applications, as it helps improve performance, enhance resilience, keep systems up to date, and eliminate stale or irrelevant code. However, the process can be exceedingly complex and time-consuming because the code is often distributed across a multitude of environments. While artificial intelligence (AI) has already begun assisting with various lower-level programming tasks, it has struggled to handle the convoluted task of code migration effectively.

However, Google has made strides in overcoming this challenge by employing a new step-by-step process and a common toolkit wherein large language models (LLMs) identify the files that need changes. According to Google, this innovative process has accelerated code migrations by 50%, setting a new standard in the industry. In a recent experience report, a team from Google Core and Google Ads described their approach, noting that it has the potential to revolutionize how code is maintained in large enterprises. Here, we explore the intricate steps Google has taken and highlight a few practical use cases.

1. Locate Code Spots Where Flags (Experiments) Are Mentioned

Google’s primary objective was to identify opportunities for LLMs to deliver added value and support scalability without relying on difficult-to-maintain abstract syntax trees (AST). Traditionally, ASTs have been used to represent the structure of a program or a code snippet, but they are deterministic, meaning that outcomes are predefined. Code migration scenarios often involve complex constructs that ASTs find challenging to represent. Google’s teams noted that success in LLM-based code migration isn’t straightforward. Utilizing LLMs alone through simple prompting isn’t sufficient for complex migrations. Instead, a combination of AST-based techniques, heuristics, and LLMs is essential to achieve success and ensure the changes are rolled out safely to avoid costly regressions.

Success for Google was measured by achieving at least a 50% reduction in the time required for end-to-end work, including code rewrites, identifying migration locations, conducting reviews, and performing the final rollout. In the end, engineers reported that this milestone was indeed achieved, with 80% of code modifications being fully AI-authored. Anecdotal evidence from developers indicated that even if changes weren’t perfect, significant value was found in having an initial version of the changelist already created. This initial effort often paved the way for further refinements and optimizations.

2. Remove Code Mentions of the Flag

One of the largest business units within Google, Google Ads, operates on a code base consisting of over 500 million lines of code. The system employs dozens of numerical unique ID types that refer to various resources, such as users, merchants, and campaigns. These IDs are usually defined as 32-bit integers in C++ and Java but had to be converted to 64-bit IDs to prevent ID value overflows. The report noted that moving from 32-bit to 64-bit was fraught with difficulties. Within Google’s ecosystem, IDs are sparsely defined and hard to locate, making them difficult to search and identify through static tools. Compounding the challenge is the fact that Google Ads features tens of thousands of code locations, rendering manual tracking overly complicated.

In this scenario, Google’s LLM-powered code migration process was a game-changer. Initially, an engineer identified the necessary IDs, file supersets, and locations for migration. The required changes were then generated within the LLM, fostering a feedback loop of testing and iteration. This iterative process allowed the engineer to review LLM-generated code as they would with any other codebase, making changes and corrections as necessary. Once this step was complete, the changes were split and sent for final review by the proprietors of each code segment, ensuring that migrations were carried out efficiently and accurately.

3. Streamline Any Conditional Statements That Rely on the Flag

Another pertinent example involves a significant set of test files still using the now-outdated JUnit3 library, a unit testing open-source framework for Java. Manually updating these files posed a considerable challenge and could have negatively impacted the codebase by introducing technical debt. Technical debt tends to replicate itself, as developers might inadvertently copy outdated code to produce new code. To tackle this issue, Google’s developers used LLMs to update a critical mass of JUnit3 tests to the new JUnit4 library. This automated update enabled the smooth migration of 5,359 files, modifying more than 149,000 lines of code over three months. This effort exemplifies Google’s efficient approach to transitioning to up-to-date technologies, crucial for maintaining the health and performance of their vast codebase.

4. Eliminate Any Redundant Code

In another use case, Google faced the challenge of cleaning up experimental code that had become stale. Obsolete experimental code can lead to inefficiencies and maintenance headaches. Using AI, Google performed several crucial steps to clean up such code. Initially, they located areas in the code where flags or experiments were mentioned, subsequently removing any code references to the flag. Next, they simplified any conditional expressions that depended on the flag and eliminated any redundant or dead code. This meticulous cleanup also involved updating existing tests while discarding any unnecessary or obsolete tests. This comprehensive approach ensured that the codebase remained clean, efficient, and scalable, significantly reducing the time and resources required for manual cleanup.

5. Revise Tests and Discard Unnecessary Tests

Code migration is essential for maintaining software applications, as it boosts performance, resilience, keeps systems current, and removes outdated code. However, it can be highly complex and time-consuming since code is often scattered across numerous environments. While artificial intelligence (AI) has begun to assist with various basic programming tasks, it has faced challenges in effectively managing the intricate process of code migration.

Google has made significant progress in addressing this issue by implementing a new step-by-step process and a standardized toolkit where large language models (LLMs) pinpoint the necessary file changes. Google reports that this innovative approach has sped up code migrations by 50%, setting a new benchmark in the industry. In a recent experience report, a team from Google Core and Google Ads detailed their method, suggesting it could transform code maintenance in large enterprises. Here, we delve into the detailed steps Google has undertaken and showcase a few practical applications of their pioneering method.

Explore more

Local SEO: A Must for Travel & Tourism Success

In recent years, travelers have increasingly turned to digital channels when planning their journeys, making the role of search engines immensely pivotal in this process. Search engines, particularly Google, have become indispensable tools in the arsenal of tourists globally, eclipsing social media and word-of-mouth recommendations. For travel and tourism operators such as boutique hotels, niche tour providers, and vacation rental

Why is LinkedIn the Go-To for B2B Advertising Success?

In an era where digital advertising is fiercely competitive, LinkedIn emerges as a leading platform for B2B marketing success due to its expansive user base and unparalleled targeting capabilities. With over a billion users, LinkedIn provides marketers with a unique avenue to reach decision-makers and generate high-quality leads. The platform allows for strategic communication with key industry figures, a crucial

Endpoint Threat Protection Market Set for Strong Growth by 2034

As cyber threats proliferate at an unprecedented pace, the Endpoint Threat Protection market emerges as a pivotal component in the global cybersecurity fortress. By the close of 2034, experts forecast a monumental rise in the market’s valuation to approximately US$ 38 billion, up from an estimated US$ 17.42 billion. This analysis illuminates the underlying forces propelling this growth, evaluates economic

How Will ICP’s Solana Integration Transform DeFi and Web3?

The collaboration between the Internet Computer Protocol (ICP) and Solana is poised to redefine the landscape of decentralized finance (DeFi) and Web3. Announced by the DFINITY Foundation, this integration marks a pivotal step in advancing cross-chain interoperability. It follows the footsteps of previous successful integrations with Bitcoin and Ethereum, setting new standards in transactional speed, security, and user experience. Through

Embedded Finance Ecosystem – A Review

In the dynamic landscape of fintech, a remarkable shift is underway. Embedded finance is taking the stage as a transformative force, marking a significant departure from traditional financial paradigms. This evolution allows financial services such as payments, credit, and insurance to seamlessly integrate into non-financial platforms, unlocking new avenues for service delivery and consumer interaction. This review delves into the