Renaldi for AWS Community Builders

Posted on Apr 25

Implementation of Melodistiq: Generating Lyrics and Melodies with AI

#ai #webdev #python #music

Introduction

Hello, fellow cloud enthusiasts and builders!

In today's blog post, we will dive into an intriguing project that combines artificial intelligence with music creation. This post will detail the implementation of a Python code designed to generate music lyrics and melodies using AI technologies. This follows the successful presentation of the "Unlocking Musical Creativity with AI: Generating Lyrics and Melodies" talk.

The Scenario

The goal of this project is to automate the creation of music lyrics and melodies. This can be particularly useful for musicians seeking inspiration or developers exploring the intersection of AI and creative arts. The process involves generating lyrics based on existing song data, creating a melody that fits the generated lyrics, and presenting the final composition in both audio and written formats.

Libraries and Tools Used

To accomplish this task, we utilize several Python libraries, each serving a distinct purpose:

Pandas (pandas): A powerful data manipulation library used here to handle and preprocess lyrical data.
Natural Language Toolkit (NLTK): This library is used for text processing to analyze, manipulate, and generate text data.
OpenAI's GPT-3.5: Leveraged to enhance the quality of the generated lyrics.
Mingus: An advanced music theory and notation package used to handle music data and generate melodies.
FastText: A library developed by Facebook for efficient learning of word representations and sentence classification.
FPDF: A library to generate PDF files, useful for presenting the lyrics in a readable format.

Let's walk through implementing our code now.

Implementation

Setting Up and Downloading Models
First, we load necessary models and corpora:

import fasttext
import fasttext.util
import nltk
nltk.download('cmudict')
nltk.download('punkt')
from nltk.corpus import cmudict
d = cmudict.dict()

fasttext.util.download_model('en', if_exists='ignore')
ft = fasttext.load_model('cc.en.300.bin')

Here, fastText and NLTK's CMU Pronouncing Dictionary are initialized. fastText is used later to find words similar to those not found in the CMU dictionary.

Data Collection
Data is loaded and preprocessed from a CSV file containing song lyrics:

import pandas as pd
df = pd.read_csv('EdSheeran.csv')
df = df.dropna()
lyrics = df['Lyric'].str.replace('\n', ' ').str.replace('\r', ' ').tolist()

This section reads a CSV file, cleans the data by removing missing values, and formats the lyrics into a list.

Data Preprocessing
The lyrics are tokenized and cleaned:

import re
from nltk.tokenize import word_tokenize
words = [word_tokenize(re.sub(r'\W+', ' ', lyric).lower()) for lyric in lyrics]
words = [word for sublist in words for word in sublist]

Here, special characters are removed, and the text is converted to lowercase to standardize the data.

N-gram Model and Lyrics Generation
An N-gram model is used to generate new lyrics:

N-gram Model and Lyrics Generation

An N-gram model is used to generate new lyrics:

from nltk.util import ngrams
from nltk.probability import FreqDist
import random

n_values = [2, 5, 7]
generated_lyrics = ""

for n in n_values:
    ngrams_list = list(ngrams(words, n, pad_left=True, pad_right=True))
    freq_dist = FreqDist(ngrams_list)

    def generate_lyrics(starting_ngram, freq_dist, num_words):
        generated_words = list(starting_ngram)
        for _ in range(num_words):
            next_word_candidates = [ngram[-1] for ngram in freq_dist.keys() if ngram[:n-1] == tuple(generated_words[-(n-1):])]
            if next_word_candidates:
                next_word = random.choice(next_word_candidates)
                generated_words.append(next_word)
            else:
                break
        return ' '.join(generated_words).replace(' ,', ',').replace(' .', '.').replace(' ;', ';')

    starting_ngram = random.choice(list(freq_dist.keys()))
    generated_lyrics += generate_lyrics(starting_ngram, freq_dist, 200)

This segment builds N-grams from the cleaned words and creates a frequency distribution to model the probabilities of word sequences. New lyrics are generated based on these probabilities.

Enhancing Lyrics with GPT-3.5
We then look to enhance our lyrics with the help of GPT-3.5.

import openai
import os
openai.api_key = os.getenv('openai_api_key')
conversations = {}
session_id=0;
conversations[session_id] = []

conversations[session_id].append({"role": "system", "content": "You are a helpful assistant who will transform the lyrics below into a song."})
conversations[session_id].append({"role": "user", "content": generated_lyrics})

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=conversations[session_id],
  max_tokens=200
)

gpt_lyrics = response.choices[0]["message"]["content"].strip()

Here, the OpenAI GPT-3.5 API is used to refine and enhance the generated lyrics, adding a layer of complexity and polish that might be lacking from the simple N-gram model. The environment variable openai_api_key is used to authenticate the API request.

Generating and Exporting the Melody
The code then generates a melody using the Mingus library based on the stress patterns of the lyrics:

from mingus.containers import Note, Bar, Track
from mingus.midi import midi_file_out
import mingus.extra.lilypond as lilypond

def generate_melody(lyrics):
    # Load the CMU Pronouncing Dictionary
    stress_pattern = []

    tokens = nltk.word_tokenize(lyrics)

    # Fix contractions
    fixed_tokens = [contractions.fix(token) for token in tokens]

    # Function to get the stress pattern of a word
    def get_stress(word):
        word = word.lower()
        phones = pronouncing.phones_for_word(word)
        if phones:
            stress_pattern = [int(s) for s in pronouncing.stresses(phones[0])]
            return [stress_pattern]
        else:
            # handle contractions
            if "'" in word:
                parts = word.split("'")
                stress_pattern = []
                for part in parts:
                    stress_pattern += get_stress(part)
                return stress_pattern
            # handle hyphenated words
            elif '-' in word:
                parts = word.split('-')
                stress_pattern = []
                for part in parts:
                    stress_pattern += get_stress(part)
                return stress_pattern
            else:
                print(f'Word not found in dictionary: {word}')
                # Find a similar word in the dictionary and use its stress pattern
                similar_word = find_similar_word(word)
                if similar_word:
                    return get_stress(similar_word)
                else:
                    # Use default pattern if no similar word is found
                    return [[0, 1, 2]]

    # Get the stress pattern of the lyrics
    for word in fixed_tokens:
        # remove punctuation
        word = re.sub(r'[^\w\s]', '', word)
        stress_pattern += get_stress(word)

    # Flatten the stress_pattern list
    stress_pattern = [item for sublist in stress_pattern for item in sublist]

    print(lyrics)
    print(tokens)
    print(["Here are the stress patterns:"] + stress_pattern)

    # Generate a melody based on the stress pattern
    track = Track()
    b = Bar()
    b.set_meter((4, 4))
    beats_in_current_bar = 0
    for stress in stress_pattern:
        if stress == 0:
            note = Note('C', 4)
        elif stress == 1:
            note = Note('E', 4)
        elif stress == 2:
            note = Note('G', 4)
        b + note
        beats_in_current_bar += 1
        if beats_in_current_bar == 4:
            track.add_bar(b)
            b = Bar()
            b.set_meter((4, 4))
            beats_in_current_bar = 0
    track.add_bar(b)


    return track

Here, we define a function generate_melody that takes a string of lyrics as input and generates a melody based on the phonetic stress pattern of the words. The function uses the nltk library to tokenize the lyrics, handles contractions, and determines the stress pattern of each word using the CMU Pronouncing Dictionary. We account for special cases like contractions, hyphenated words, and words not found in the dictionary, attempting to find similar words or applying a default stress pattern when necessary.

After extracting and flattening the stress pattern of the entire lyrics, the function uses this pattern to create a melody where different stress levels are mapped to specific musical notes (C, E, G) in a 4/4 time signature, adding these notes to a musical track using the mingus library. Each stress level in the pattern corresponds to a different note, and the function organizes these notes into bars, with each bar containing up to four beats. The track, comprising a series of bars filled with notes based on the lyrical stress pattern, is returned at the end of the function. This allows for the conversion of lyrical content into a basic musical representation, integrating elements of natural language processing and music composition.

We can then look into creating the MIDI file and sheet music:

formatted_lyrics = format_lyrics(gpt_lyrics)
formatted_lyrics_with_newlines = add_newlines(formatted_lyrics)
print("Here are the lyrics:" + formatted_lyrics_with_newlines)

melody = generate_melody(formatted_lyrics_with_newlines)

lilypond_string = lilypond.from_Track(melody)
with open('melody.ly', 'w') as f:
    f.write(lilypond_string)

print(melody)

As can be seen, we output the lyrics and MIDI.

Outputting Lyrics as PDF
Finally, we generate a PDF document containing the formatted lyrics, useful for singers or for archival purposes.

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font('Arial', 'B', 16)
title = formatted_lyrics_with_newlines.split('\n')[0]
pdf.cell(0, 10, title, 0, 1, 'C')
pdf.set_font('Arial', '', 12)
pdf.multi_cell(0, 10, formatted_lyrics_with_newlines)
pdf.output('lyrics.pdf')

Now, we have our fully functioning Python code for making music! Now, we can decide where to deploy this. Leveraging the capability of AWS, I am going to deploy it on an EC2.

Setting Up the EC2 Instance

Go to the EC2 dashboard in AWS Management Console.
Click on "Launch Instance".
Choose an Amazon Machine Image (AMI), such as Amazon Linux 2 AMI or Ubuntu Server.
Select an instance type (e.g., t2.micro for testing purposes).
Configure instance details as required.
Add storage if the default isn’t enough.
Configure Security Group to allow SSH (port 22) and any other necessary ports (e.g., port 80 for web server).
Review and launch the instance by selecting or creating a new key pair.

Now that we have the EC2 instance ready, we can access it.

Accessing the EC2 Instance
Connect to your instance using SSH. For Windows, you can use PuTTY or any SSH client:
ssh -i /path/to/your-key.pem ec2-user@your-instance-ip

Environment Setup
Install Python and other necessary tools.

sudo yum update -y
sudo yum install python3 git -y

Now move the code over to EC2 with SCP or SFTP and install the relevant Python libraries. You can also clone the repo provided at the end of this post.
pip3 install pandas nltk openai pronouncing re fpdf fasttext

You also will need to install lilypond for usage with Yum.
sudo yum install lilypond -y

Set environment variables (for example, for the OpenAI API key):
export openai_api_key='your-api-key'

With the environment now set up, we can run and automate the script.

Running and Automating the Script
Run the script with the command below.
python3 main.py

To ensure the script runs automatically at system boot or keeps running after disconnections, we will use nohup to run the script in the background.
nohup python3 your_script.py &

Set up a systemd service if you want it to start at boot and restart on failures.

We can then monitor the script execution through checking the logs.
tail -f nohup.out

Optionally, configure CloudWatch to monitor the EC2 performance or to set up more advanced logging and alerting.

And with that, we've set up our deployed solution! It takes a bit of learning to wrap your head around the NLP and music theory bits, but it really is quite a straightforward approach to getting started on making music with AI.

Conclusion
This project showcases the power of AI in creative processes like songwriting. By integrating various technologies and libraries, we've created a system that not only generates lyrics but also composes a melody, presenting it in both audio and visual forms. This illustrates the potential for AI to assist in artistic expressions, providing tools that can inspire and enhance the creative capabilities of its users. Whether you're a developer, a musician, or an enthusiast in the realms of AI or music, this project offers fascinating insights and possibilities.

The repo to the project can be accessed at: https://github.com/renaldig/melodistiq-music-generator

DEV Community

Implementation of Melodistiq: Generating Lyrics and Melodies with AI

Introduction

The Scenario

Libraries and Tools Used

Implementation

Top comments (0)

Read next

Day 6 of JavaScript

How to create a grid toggle with Tailwind CSS and Alpinejs

How to Add Firebase Authentication To Your NodeJS App

IDM-VTON: The Most Amazing Virtual Try Anything On Application - Windows, Massed Compute, RunPod & Kaggle