Generative AI training data sets are now trackable – and often legally complicated
Computerworld October 26, 2023
A new tool, Data Provenance Explorer, lets users pick through the questionable provenance of many large data sets used for AI training.
A new online tool allows users to identify, track and learn about the legal status of training data sets for generative AI, and a quick glance shows that many may have licensing issues.
The tool, dubbed the Data Provenance Explorer, is the result of a joint effort between machine learning and legal experts from MIT, generative AI API provider Cohere, and 11 other organizations — Harvard Law School, Carnegie Mellon University and Apple are all among the contributors. The Data Provenance Explorer lets researchers, journalists and anyone else search through thousands of AI training databases and trace the...