These days, the internet is flooded with AI generated images and videos. The generative AI which creates them is trained on real fotos and video material. As this data is - per definition - perfect (nothing can be more real than reality), the material generated by AI must be of inferior quality. If such material is published in large quantities, the training data for future AI degrades. It is known that this leads to a performance degradation of models trained on such data. This phenomenon is known unter the term „poisoning the well“. But this might not be the case for other types of data. If the utility of the output for humans is not bounded, things might play out differently. Let’s look at an example based on AI generated program code. Let’s assume humankind continues to generate large amounts of computer code using AI (vibe coding). Now two effects might limit the amount of this code which is used for the training of the next generation of AI systems: If the generated code does…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.