Snorkel AI Introduces New Capabilities for Generative AI Data Curation

Data labeling has been critical in preparing data for machine learning and artificial intelligence. However, traditional data labeling methods can only go so far. That is why Snorkel AI is introducing new capabilities beyond data labeling. The new tools will help organizations curate and prepare data for generative AI, ensuring that the data is optimized for this subfield of AI.

Snorkel AI’s new capabilities are called GenFlow and Snorkel Foundry. Together, they help organizations build generative AI applications while also providing the infrastructure to create customized language model models (LLMs). This is crucial in generating data in a way that is compatible with generative AI, which requires different data preparation than other AI subfields. While labeling remains important for predictive AI, generative AI poses new challenges that require advanced data curation.

Why Snorkel AI’s new capabilities are game-changing

The new capabilities offered by Snorkel AI are designed to overcome the limitations of traditional data labeling methods. Historically, data labeling has been done manually or through crowdsourcing, which can be time-consuming and expensive.

GenFlow promises to automate some of the more tedious aspects of preparing data for generative AI. This is done by improving the pipeline used to ingest, preprocess, and curate training data for AI models. Instead of labeling data manually, GenFlow uses multiple inputs and outputs to produce new training data faster. This results in higher-quality data with the ability to generate more of it.

On the other hand, Snorkel Foundry focuses on data curation, which is crucial for accurate results and developing predictive models. The platform is designed to help organizations build their customized LLMs, rather than relying on pre-built ones. This allows organizations to customize their LLMs to their specific needs, rather than being limited by what is currently available in the market.

The Importance of Good Data in Generative AI

Snorkel AI emphasizes the importance of good data in generative AI. The current hype around generative AI cannot mask the fact that these algorithms need excellent data to function properly. GenFlow and Snorkel Foundry help organizations generate data that is compatible with generative AI, ensuring that the resulting models are accurate and dependable.

Why is additional instruction tuning necessary for LLMs?

Even with good data, pre-trained LMs still require additional instruction and tuning. Common approaches include RLHF, which is used to fine-tune models to achieve better accuracy. Additional tuning helps LMs to learn and generalize from a broader range of tasks than its pre-training. This is essential for unlocking an LM’s full potential while ensuring that it remains efficient and accurate.

Predictive AI vs. Generative AI: Finding the Enterprise Value

Despite the hype around generative AI, Snorkel founder Alex Ratner thinks that predictive AI will continue to be essential for generating enterprise value in the long run. Predictive AI is better suited for tasks such as fraud detection, which requires accurate and consistent classification. While generative AI is still valuable, it remains more challenging to deploy and optimize than predictive AI.

The Importance of Data Labeling in Predictive AI

While data curation for generative AI involves a different process than predictive AI, data labeling remains essential for both. Data labeling is crucial for predictive AI tasks such as fraud detection or image recognition. Without good data, predictive models would be less accurate, which can lead to costly business decisions. Therefore, labeling remains a critical part of data preparation and curation.

Feedback remains important in generative AI, albeit in a different form than feedback for predictive AI. Feedback for generative AI helps organizations continuously optimize their models and adjust their inputs based on the generated models. This results in data that produces more accurate and efficient models, which is essential for organizations that need predictive models that are up-to-date.

Snorkel AI’s GenFlow and Snorkel Foundry advance the state of the art in data curation and preparation for generative AI. By automating parts of the data preparation pipeline and creating customized LLMs, organizations can create accurate and dependable AI models. However, data curation remains essential for predictive AI as well, emphasizing the continued need for labeling and data preparation. In subsequent years, feedback will be essential for generative AI to further advance the state of the art.

Explore more