This morning, an article on MAD, or Model Autophagy Disorder, caught my eye. Apparently, this is what happens when a large language model is trained on data generated by itself and other large language models. Within a surprisingly few iterations, within ten iterations, a model tasked with writing articles about historical English architecture was giving out rubbish about Jackrabbits, and one tasked with copying handwritten numerals was reduced to incoherent scribble. It leads to “model collapse.”
I wonder why we are surprised. We’ve long known about the cousins thing; this is just its digital equivalent. Whilst this makes for great article headlines, I have little doubt progress in filtering out will be made, though probably in much the same way as progress was made with the Post Office Horizon systems - slowly, quietly and out of sight, sheltering behind carefully, and expensively crafted legal “nothing to see here, move along please” statements.
This will not stop AI from having huge positive potential, with one big proviso: managers start thinking for themselves about how they task AI and how they supervise it. For a while, at least, developing our use of AI will require many of the attributes of craft and the disposition of the Artisan.
https://www.scientificamerican.com/article/ai-generated-data-can-poison-future-ai-models/#