Google has created a new artificial intelligence (AI) system called MusicLM that can produce music in any genre from a text description.
“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”,” said Google in a research paper.
MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.
“Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description,” said Google.
“Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption.
“To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.”
Google has released a paper trying to explain this complex way to create music from text.
For those who understand complex things, herewith below figure explaining the process.
MusicLM
Implementation of MusicLM, Google’s new SOTA model for music generation using attention networks, in Pytorch.
They are basically using text-conditioned AudioLM, but surprisingly with the embeddings from a text-audio contrastive learned model named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.
We’ve all had some kind of interaction with a chatbot. It’s usually a little pop-up in the corner of a website, offering customer support – often clunky to navigate – and almost always frustratingly non-specific.
But imagine a chatbot, enhanced by artificial intelligence (AI), that can not only expertly answer your questions, but also write stories, give life advice, even compose poems and code computer programs.
It seems ChatGPT, a chatbot released last week by OpenAI, is delivering on these outcomes. It has generated much excitement, and some have gone as far as to suggest it could signal a future in which AI has dominion over human content producers.
What has ChatGPT done to herald such claims? And how might it (and its future iterations) become indispensable in our daily lives?
What can ChatGPT do?
ChatGPT builds on OpenAI’s previous text generator, GPT-3. OpenAI builds its text-generating models by using machine-learning algorithms to process vast amounts of text data, including books, news articles, Wikipedia pages and millions of websites.
By ingesting such large volumes of data, the models learn the complex patterns and structure of language and acquire the ability to interpret the desired outcome of a user’s request.
ChatGPT can build a sophisticated and abstract representation of the knowledge in the training data, which it draws on to produce outputs. This is why it writes relevant content, and doesn’t just spout grammatically correct nonsense.