AI data sourcing is apt to be the critical technological dimension of the dawning era of AI. Human Native AI, a company whose approach to data procurement I hope will be the path forward for the use of data in training AI systems, is developing an AI ‘marketplace’ that could resolve some of the harsh inequities in the use of AI data. We might hope that the development of AI follows the trajectory taken by music production and sharing, where a lawless early phase gave way to a more equitable digital ecosystem.
Now the sophistication of AI systems and LLMs has become so great, that this data need is even greater than when the current systems were developed. Herein lies the dilemma. How do we train these systems using data that has been properly acquired ethically and legally? Recent licensing deals between OpenAI and magazines like The Atlantic and Vox point to an intensifying industry-wide effort to licence AI training to companies in a legal and socially responsible way.
Human Native AI, a London-based startup, is leading the way in organising such ethical agreements. The company builds a marketplace for companies developing LLM projects to connect with rights holders who are willing to licence their data. The rights holders can upload their content while the AI companies can, in turn, licence their data and pay for its use. The owner of the content maintains control over it and is fairly compensated for sharing it, all on a self-service marketplace.
This kind of licensing deal – including, in this case, with the high-profile publisher Vox – go some way towards setting the agenda for a multitude of AI news stories – harvesting ethically sourced data in order to make mainstream AI developments help, not hinder, the spirit of other content creators. To ensure that they both profited from this digital synergy.
These flagship agreements are not just contracts, but a codification of the notion that content should be taken into consideration and respected as part of training AI programs. It is a way of acknowledging the need for ethics and remuneration in the age of the internet, and one that will hopefully be followed by others.
And the crowning achievement of this platform, designed by Human Native AI, is that it would make it easier for small AI projects too – the non-OpenAIs – to get hold of good, verified data. If it does, it might level the playing field by allowing small AI projects access to a level of test data that’s hitherto been cost-prohibitive. Small AI startups could thrive in a situation where they aren’t at a disadvantage due to natural language processing ‘bootstrapping’ issues. Again, giving the little guy a shot. More might come to bedevil us, or more might be the salvation of humanity.
In this way, Vox signifies hope for smaller AI systems that they too can ‘source’ ethically data without the prohibitive financial cost and shift the AI developmental landscape into a more inclusive dynamic.
It’s more than an idea as Human Native AI gains momentum and funding. The premium on HumAI for ‘good’, ethically generated, data use grows ever more real with each passing day, especially as the mighty European Union AI Act and other regulations and initiatives loom on the horizon. Human Native AI is no longer just a compliance ‘nice to have’, but a tech future that will allow AI technologies to flourish, respectfully and responsibly by using data originated legally and fairly with the active permission of the creators of the content.
Of all the roles that content can play in the machine learning landscape, an article like this one for Vox adds crucial diversity to the data that will be used to train the AI system, and its existence sets an ethical standard for licensing content for use in such AI systems. The model of partnership between AI companies and the creators of the content that will be used to train their systems, pioneered and exemplified by these deals with Vox, is a sustainable and ethical way forward for AI development.
And as we continue on our march into this AI age, I will be rooting (and paying) for companies such as Human Native AI – and supportive media companies such as Vox, which insist, through their very name, that we pay attention to the humans who create and use data in a way that respects equity and fosters mutual benefit. Vox et vota, indeed.
© 2024 UC Technology Inc . All Rights Reserved.