While the hype surroundingOpenAI’s ChatGPThas died down considerably since its initial launch, the company isn’t resting on its laurels by any means, and really managed to “wow” the internet a couple of months back with the introduction of its text-to-video model dubbed Sora. From what we’ve seen so far, the technology is absolutely incredible, capable of producing lifelike videos that really could fool someone into believing that they are real.
What is OpenAI?
OpenAI is igniting the AI revolution with bold projects and visionary alliances
Although the chats, images and movies that can be produced by prompts are astounding, there is a dark side that has surfaced with this recent movement, as these systems don’t just create these out of thin air and instead rely on tons of data for their training. This can come in the form of images, videos, and even articles. And while some sources are appropriate for this type of use, others are not. And that’s where some companies are having issues, as it’s not entirely clear where the training data is coming from.

OpenAI CTO is unclear if Sora is trained using YouTube and other social platforms
Perhaps what was more alarming is the fact that OpenAi’s CTO, Mira Murati, wheninterviewed by The Wall Street Journaljust last month, wasn’t exactly sure or clear about where the training data for Sora was coming from either (viaEngadget). And while it’s unclear whether YouTube videos were or are being used for training, YouTube’s CEO Neal Mohan has now perhaps taken a shot across the bow, issuing a warning to OpenAI that using videos on its platform is not allowed.
The proclamation comes from an interview with Emily Chang onBloomberg Originals, where Mohan stated “It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.” YouTube’s parent company, Google, has also been working on its own multimodal AI calledGemini, which also relies on training data, but Mohan stated that “Google adheres to YouTube’s individual contracts with creators before deciding whether to use videos from the platform.”
It will be interesting to see if OpenAI will ever respond, and clarify exactly how it’s training Sora, which it may eventually need to do if it intends to allow the public the use of its tools. Of course, things will just continue to evolve from here, and while there are a lot of uncertainties involved, it also comes along with alot of exciting possibilities.