Machine learning has been around for a long time. But in late 2022, recent advancements in deep learning and large language models started to change the game and come into the public eye. And people started thinking, “We love Open Source software, so, let’s have Open Source AI, too.”
But what is Open Source AI? And the answer is: we don’t know yet.
Machine learning models are not software. Software is written by humans, like me. Machine learning models are trained; they learn on their own automatically, based on the input data provided by humans. When programmers want to fix a computer program, they know what they need: the source code. But if you want to fix a model, you need a lot more: software to train it, data to train it, a plan for training it, and so forth. It is much more complex. And reproducing it exactly ranges from difficult to nearly impossible.
The Open Source Definition, which was made for software, is now in its third decade, and has been a stunning success. There are standard Open Source licenses that everyone uses. Access to source code is a living, working concept that people use every day. But when we try to apply Open Source concepts to AI, we need to first go back to principles.
For something to be “Open Source” it needs to have one overarching quality: transparency. What if an AI is screening you for a job, or for a medical treatment, or deciding a prison sentence? You want to know how it works. But deep learning models right now are a black box. If you look at the output of a model, it’s impossible to tell how or why the model came up with that output. All you can do is look at the inputs to see if its training was correct. And that’s not nearly as straightforward as looking at source code.
AI has the potential to greatly benefit our world. Now is the first time in history we’ve had the information and technology to tackle our biggest problems, like climate change, poverty and war. Some people are saying AI will destroy the world, but I think it contributes to the hope of saving the world.
But first, we need to trust it. And to trust it, it needs to be open and transparent.
As a consumer, you should demand that the AI you use is open. As a developer, you should know what rights you have to study and improve AI. As a voter, you should have the right to demand that AI used by the government is open and transparent.
Without transparency, AI is doomed. AI is potentially so powerful and capable that people are already frightened of it. Without transparency, AI risks going the way of crypto–a technology with great potential that gets shut down by distrust. I hope that we will figure out how to guarantee transparency before that happens, because the problems AI can help us solve are urgent, and I believe we can solve them if we work together.