The Low-Down: Piano Man: AI Generates Melodies From Lyrics

Lennon and McCartney, Bernie Taupin and Elton John, Rogers and Hammerstein.

Many of the greatest recent musical collaborations involved the skillful meshing of lyrics and and melody. Now AI has demonstrated it can perform both roles, suggesting further opportunities outside music for creative inspiration. JL

Kyle Wiggers reports in Venture Beat:

Notes have two musical attributes: pitch and duration. Pitches are properties of sounds that organize music on a scale, while duration represents the length of time a tone is sounded. Syllables align with melodies in music tracks. AI made use of the alignment data with a neural network learning long-term dependencies. (It) was trained to learn a mathematical representation at syllable and word levels to capture the synaptic structures of lyrics, learning to predict melody while accounting for the relationship between lyrics and melody. The AI outperformed a baseline model “in every respect,” approximating the distribution of human-composed music.
Generating sequences of musical notes from lyrics might sound like the stuff of science fiction, but thanks to AI, it might someday become as commonplace as internet radio. In a paper published on the preprint server Arxiv.org (“Conditional LSTM-GAN for Melody Generation from Lyrics“), researchers from the National Institute of Informatics in Tokyo describe a machine learning system that’s able to generate “lyrics-conditioned” melodies from learned relationships between syllables and notes.
“Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables to learn and discover latent relationship between interesting lyrics and accompanying melody,” wrote the paper’s coauthors. “With the development of available lyrics and melody dataset and [AI], musical knowledge mining between lyrics and melody has gradually become possible.”

As the researchers explain, notes have two musical attributes: pitch and duration. Pitches are perceptual properties of sounds that organize music by highness or lowness on a frequency-related scale, while duration represents the length of time that a pitch or tone is sounded. Syllables align with melodies in the MIDI files of music tracks; the columns within said files represent one syllable with its corresponding note, note duration, and rest.
The researchers’ AI system made use of the alignment data with a long-short-term memory (LSTM) network, a type of recurrent neural network capable of learning long-term dependencies, with a generative adversarial network (GAN), a two-part neural network consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples. The LSTM was trained to learn a joint embedding (mathematical representation) at the syllable and word levels to capture the synaptic structures of lyrics, while the GAN learned over time to predict melody when given lyrics while accounting for the relationship between lyrics and melody.
To train it, the team compiled a data set consisting of 12,197 MIDI files, each paired with lyrics and melody alignment — 7,998 files from the open source LMD-full MIDI Dataset and 4,199 from a Reddit MIDI dataset — which they cut down to 20-note sequences. They took 20,934 unique syllables and 20,268 unique words from the LMD-full MIDI, and extracted the beats-per-minute (BPM) value for each MIDI file, after which they calculated note durations and rest durations.
Here’s one generated sample:sets and feeding them into the model, the coauthors conducted a series of tests to determine how well it predicted melodies sequentially aligned with the lyrics, MIDI numbers, note duration, and rest duration. They report that their AI system not only outperformed a baseline model “in every respect,” but that it approximated well to the distribution of human-composed music. In a subjective evaluation during which volunteers were asked to rate the quality of 12 20-second melodies generated using the baseline method, the AI model, and ground truth, scores given to melodies generated by the proposed model were closer to those composed by humans than the baseline.

The researchers leave to future work synthesizing melodies with sketches of uncompleted lyrics and predicting lyrics when given melodies as a condition.
“Melody generation from lyrics in music and AI is still unexplored well [sic],” wrote the researchers. “Making use of deep learning techniques for melody generation is a very interesting research area, with the aim of understanding music creative activities of human.”
AI might soon become an invaluable tool in musicians’ compositional arsenals, if recent developments are any indication. In July, Montreal-based startup Landr raised $26 million for a product that analyzes musical styles to create bespoke sets of audio processors, while OpenAI and Google earlier this year debuted online creation tools that tap music-generating algorithms. More recently, researchers at Sony investigated a machine learning model for conditional kick-drum track generation.

A Blog by Jonathan Low

Aug 30, 2019

Piano Man: AI Generates Melodies From Lyrics

2 comments:

Post a Comment

contact

Search This Blog

Blog Archive

Labels

links