The recent announcement of a partnership between Lightricks, a pioneering visual AI software firm, and Shutterstock, a leading stock media provider, signals a transformative shift in the relationship between AI developers and media powerhouses. This collaboration has the potential to significantly impact both the respective companies and their operating ecosystems. At the core of this partnership is Lightricks’ access to Shutterstock’s extensive video content libraries for training its new open-source generative video model, LTXV. The model is available for free experimentation by developers and is also being integrated into Lightricks’ AI filmmaking web app, LTX Studio. This deal provides substantial benefits to both companies: Lightricks gains vital training data from Shutterstock’s enormous media archives, while Shutterstock diversifies its business model by exploring new AI-driven revenue streams and staying relevant in the increasingly AI-embedded creative landscape.
Addressing the Need for High-Quality Training Data
Beyond the immediate benefits to the two companies, this partnership holds broader significance for both media and AI industries. The current surge of advancements in AI video generation has highlighted a pressing need for vast amounts of high-quality training data. Recent AI developments, such as OpenAI’s Sora, Google DeepMind’s Veo2, and Meta’s upcoming Movie Gen, have all demonstrated impressive new capabilities. However, they continue to face persistent challenges such as the “uncanny valley” effect, issues with object permanence, coherence, consistency, and slow generation times, especially for longer video clips. These common issues underscore the need for more representative and comprehensive training datasets.
A critical challenge for AI video companies is the scarcity of legally and ethically sourced training data. An audit by the Data Provenance Initiative revealed that over 70% of publicly available training datasets may be unlicensed, and more than half of the existing licenses are not accurate. This exposes AI developers to potential lawsuits and ethical dilemmas. Some companies, like Runway, have faced criticism for using unlicensed data sources such as YouTube videos and pirated content, which runs afoul of platforms’ terms of use.
Navigating the Legal and Ethical Landscape
The legal landscape is becoming increasingly complex as numerous lawsuits arise over unauthorized use of copyrighted material for AI training. High-profile cases include Stability AI, Midjourney, and DeviantArt facing allegations from artists, and Google embroiled in a class-action lawsuit for using social media content to train its models. Additionally, the controversy surrounding the Books3 dataset led to its shutdown. As ethical and legal scrutiny intensifies, AI companies find it ever more challenging to access sufficient and legitimate training data.
In response to these challenges, some initiatives aim to protect creators’ rights more robustly. For instance, the newly-formed Data Providers Alliance (DPA) supports an opt-in system that emphasizes stronger safeguards for creators, allowing them to authorize the use of their content explicitly. Nevertheless, the high volume of data required by AI firms would make this an expensive and potentially prohibitive endeavor, particularly for smaller companies. Consequently, Big Tech’s dominance in the AI video sector could be reinforced, exacerbating data access disparities.
The Significance of the Lightricks-Shutterstock Partnership
It is against this backdrop that the partnership between Lightricks and Shutterstock emerges as particularly significant. The collaboration presents a model for resolving the barriers to affordable, accessible, and ethically sourced training data. By leveraging Shutterstock’s research license, which is more economical than a commercial license, Lightricks gains access to Shutterstock’s HD and 4K video library for testing and experimentation. This license operates on a revenue-sharing basis, directing 20% of revenue back to content creators, thus ensuring their consent is respected, and copyright issues are minimized.
This partnership marks a pivotal moment as it aims to foster a new ecosystem for media providers. Shutterstock’s approach, which allows contributors to opt-out of AI training datasets, handles creative consent transparently and responsibly. This strategic approach contrasts with earlier partnerships, such as the one between the film studio Lionsgate and applied AI company Runway, where the focus was on augmenting Lionsgate’s media output through proprietary data without clear benefits for other users of the improved AI model.
Democratizing Access to AI Video Technology
The legal environment is becoming more complicated as a surge of lawsuits emerges over the unauthorized use of copyrighted material in AI training. Noteworthy cases involve Stability AI, Midjourney, and DeviantArt, facing accusations from artists, while Google deals with a class-action lawsuit for utilizing social media content to train its AI models. Moreover, the controversy tied to the Books3 dataset resulted in its shutdown. As ethical and legal scrutiny increases, AI companies face growing difficulties in accessing sufficient and legitimate training data.
Efforts are being made to strengthen creators’ rights. For example, the newly-established Data Providers Alliance (DPA) advocates for an opt-in system that enhances protections for creators, allowing them to explicitly authorize the use of their content. However, the vast amount of data required by AI firms would make this approach expensive and potentially unfeasible, particularly for smaller companies. This situation may reinforce Big Tech’s dominance in the AI domain, worsening data access disparities and making it difficult for smaller entities to compete fairly.