The rapid and ever increasing expansion of artificial intelligence (AI) depends upon the ingenuity of developers who often operate beyond the necessary constraints of ethical considerations. In a case brought to light by Human Rights Watch, more than 170 photos of Brazilian children, with personal details including names, ages and locations, were harvested without consent to train an AI. The use of the dataset LAION-5B in order to train this AI exposes a vital area for reconsideration in relation to how we approach AI development: openness.
The fact that these images are scraped from locations such as personal blogs and low-visibility YouTube videos, and likewise in spaces that are often assumed to be safe from prying eyes, makes this particularly disorienting. People have a reasonable expectation of privacy when they post photos that are meant to be seen only by a family and a circle of close friends. The whole point of having a personal space online is the freedom to be foolish or see how much of your life you can put ‘out there’. This should be something that’s done with everyone’s consent. The profusion of image-training databases provides a clear example of why we should be having a fuller discussion around how we handle AI training. It is also evidence that regulatory reforms are needed.
The problem isn’t just about violations of privacy, though that is troubling enough, but about cases where innocuous images can be adulterated and put to ill use. That this is possible because AI tools are trained on exactly that data drives home how things can go terribly wrong. This suggests we need to take a much harder look at openly available datasets such as LAION-5B, which was scraped from the web and is now a repository of more than 5.85 billion image-caption pairs.
In the face of these revelations, industry voices such as LAION’s spokesperson have even spoken out about rectifying oversights, and have proposed partnerships with organisations such as the Internet Watch Foundation to purge these databases of inappropriate content. However, is open access antithetical to ethical AI development?
This puts a spotlight on the urgency of governance in the AI context. As legislatures around the world grapple with the details of digital ethics, laws such as the draft DEFIANCE Act in the US are starting to emerge as possibilities for digital identity protection, and perhaps also as mechanisms for ‘pre-emptive consent’ as a means to combat the misuses of digital avatars of ourselves.
Finding an ethically sound path for AI will be no walk in the park, but the history of these misused images of children can help us on this path. It must foster the kind of deliberation, regulation and technical stewardship that respects our common humanity.
Open is a word often applied to descriptions of datasets, software and platforms in AI, as part of the broader discussion of technology’s possibilities and responsibilities. Open implies the free-for-all nature of innovation and collaboration, but with it also comes ethical obligations. Real openness would not only enable access to information, but prevent it from exposing weaknesses, such as sensitive demographic data, including images of children. As AI and other digital technologies bring us into the unknown, let us strive for a time when technology respects and protects, when digital openness is truly safe and sound.
Yet this exploration into the ethical core of AI raises the core question: how do we reconcile the openness that is the very seed of innovation with the protections that we need to fulfil our ethical obligations? Any answer that we as a society come up to this question will help determine the legacy of AI as we move deeper into the digital future, an AI legacy that needs to be safe and protective of all people, particularly the youngest among us.
© 2024 UC Technology Inc . All Rights Reserved.