The AI Data Wars Are Brewing: Do users own their data?

Large language models (LLMs) like OpenAI's GPT-4 have caught much attention. In the background, conflicts are brewing over who gets access to the best training data. The parties maneuvering against each other are Microsoft (+OpenAI) vs. Google vs. Elon Musk vs. Amazon vs. Everyone Else. Meanwhile, governments are stepping in to enforce GDPR privacy laws.

Microsoft and Elon Musk have already traded some blows. API integrations are being shut down for connections between Xbox and Twitter and between Twitter and OpenAI. That makes sense, right? Elon wants his own AI model, and he owns Twitter's data, so he blocks the competition from using the data he owns.

Reddit is rumored to be in talks with OpenAI, negotiating pricing for access to the Reddit database. Will Reddit users be compensated for the value of their data?

This is a great video that sums up the current state of these conflicts. One of the big questions asked in the video is: will the users who created the content be compensated? Now their data is being used to train these models. Do the users own their data? This question is particularly relevant to Hive.

p.s. Uncannily, as I started to write this post, Grammarly prompted me to try their new AI-assisted writing feature. I guess I should've seen this coming. I did not use this feature while writing this post.