DEV Community

Cover image for Implementation of Melodistiq: Generating Lyrics and Melodies with AI
Renaldi for AWS Community Builders

Posted on

Implementation of Melodistiq: Generating Lyrics and Melodies with AI

Introduction

Hello, fellow cloud enthusiasts and builders!

In today's blog post, we will dive into an intriguing project that combines artificial intelligence with music creation. This post will detail the implementation of a Python code designed to generate music lyrics and melodies using AI technologies. This follows the successful presentation of the "Unlocking Musical Creativity with AI: Generating Lyrics and Melodies" talk.

Image description

The Scenario

The goal of this project is to automate the creation of music lyrics and melodies. This can be particularly useful for musicians seeking inspiration or developers exploring the intersection of AI and creative arts. The process involves generating lyrics based on existing song data, creating a melody that fits the generated lyrics, and presenting the final composition in both audio and written formats.

Libraries and Tools Used

To accomplish this task, we utilize several Python libraries, each serving a distinct purpose:

  • Pandas (pandas): A powerful data manipulation library used here to handle and preprocess lyrical data.
  • Natural Language Toolkit (NLTK): This library is used for text processing to analyze, manipulate, and generate text data.
  • OpenAI's GPT-3.5: Leveraged to enhance the quality of the generated lyrics.
  • Mingus: An advanced music theory and notation package used to handle music data and generate melodies.
  • FastText: A library developed by Facebook for efficient learning of word representations and sentence classification.
  • FPDF: A library to generate PDF files, useful for presenting the lyrics in a readable format.

Let's walk through implementing our code now.

Implementation

Setting Up and Downloading Models
First, we load necessary models and corpora:

import fasttext
import fasttext.util
import nltk
nltk.download('cmudict')
nltk.download('punkt')
from nltk.corpus import cmudict
d = cmudict.dict()

fasttext.util.download_model('en', if_exists='ignore')
ft = fasttext.load_model('cc.en.300.bin')

Enter fullscreen mode Exit fullscreen mode

Here, fastText and NLTK's CMU Pronouncing Dictionary are initialized. fastText is used later to find words similar to those not found in the CMU dictionary.

Data Collection
Data is loaded and preprocessed from a CSV file containing song lyrics:

import pandas as pd
df = pd.read_csv('EdSheeran.csv')
df = df.dropna()
lyrics = df['Lyric'].str.replace('\n', ' ').str.replace('\r', ' ').tolist()
Enter fullscreen mode Exit fullscreen mode

This section reads a CSV file, cleans the data by removing missing values, and formats the lyrics into a list.

Data Preprocessing
The lyrics are tokenized and cleaned:

import re
from nltk.tokenize import word_tokenize
words = [word_tokenize(re.sub(r'\W+', ' ', lyric).lower()) for lyric in lyrics]
words = [word for sublist in words for word in sublist]
Enter fullscreen mode Exit fullscreen mode

Here, special characters are removed, and the text is converted to lowercase to standardize the data.

N-gram Model and Lyrics Generation
An N-gram model is used to generate new lyrics:

N-gram Model and Lyrics Generation

An N-gram model is used to generate new lyrics:

from nltk.util import ngrams
from nltk.probability import FreqDist
import random

n_values = [2, 5, 7]
generated_lyrics = ""

for n in n_values:
    ngrams_list = list(ngrams(words, n, pad_left=True, pad_right=True))
    freq_dist = FreqDist(ngrams_list)

    def generate_lyrics(starting_ngram, freq_dist, num_words):
        generated_words = list(starting_ngram)
        for _ in range(num_words):
            next_word_candidates = [ngram[-1] for ngram in freq_dist.keys() if ngram[:n-1] == tuple(generated_words[-(n-1):])]
            if next_word_candidates:
                next_word = random.choice(next_word_candidates)
                generated_words.append(next_word)
            else:
                break
        return ' '.join(generated_words).replace(' ,', ',').replace(' .', '.').replace(' ;', ';')

    starting_ngram = random.choice(list(freq_dist.keys()))
    generated_lyrics += generate_lyrics(starting_ngram, freq_dist, 200)
Enter fullscreen mode Exit fullscreen mode

This segment builds N-grams from the cleaned words and creates a frequency distribution to model the probabilities of word sequences. New lyrics are generated based on these probabilities.

Enhancing Lyrics with GPT-3.5
We then look to enhance our lyrics with the help of GPT-3.5.

import openai
import os
openai.api_key = os.getenv('openai_api_key')
conversations = {}
session_id=0;
conversations[session_id] = []

conversations[session_id].append({"role": "system", "content": "You are a helpful assistant who will transform the lyrics below into a song."})
conversations[session_id].append({"role": "user", "content": generated_lyrics})

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=conversations[session_id],
  max_tokens=200
)

gpt_lyrics = response.choices[0]["message"]["content"].strip()
Enter fullscreen mode Exit fullscreen mode

Here, the OpenAI GPT-3.5 API is used to refine and enhance the generated lyrics, adding a layer of complexity and polish that might be lacking from the simple N-gram model. The environment variable openai_api_key is used to authenticate the API request.

Generating and Exporting the Melody
The code then generates a melody using the Mingus library based on the stress patterns of the lyrics:

from mingus.containers import Note, Bar, Track
from mingus.midi import midi_file_out
import mingus.extra.lilypond as lilypond

def generate_melody(lyrics):
    # Load the CMU Pronouncing Dictionary
    stress_pattern = []

    tokens = nltk.word_tokenize(lyrics)

    # Fix contractions
    fixed_tokens = [contractions.fix(token) for token in tokens]

    # Function to get the stress pattern of a word
    def get_stress(word):
        word = word.lower()
        phones = pronouncing.phones_for_word(word)
        if phones:
            stress_pattern = [int(s) for s in pronouncing.stresses(phones[0])]
            return [stress_pattern]
        else:
            # handle contractions
            if "'" in word:
                parts = word.split("'")
                stress_pattern = []
                for part in parts:
                    stress_pattern += get_stress(part)
                return stress_pattern
            # handle hyphenated words
            elif '-' in word:
                parts = word.split('-')
                stress_pattern = []
                for part in parts:
                    stress_pattern += get_stress(part)
                return stress_pattern
            else:
                print(f'Word not found in dictionary: {word}')
                # Find a similar word in the dictionary and use its stress pattern
                similar_word = find_similar_word(word)
                if similar_word:
                    return get_stress(similar_word)
                else:
                    # Use default pattern if no similar word is found
                    return [[0, 1, 2]]

    # Get the stress pattern of the lyrics
    for word in fixed_tokens:
        # remove punctuation
        word = re.sub(r'[^\w\s]', '', word)
        stress_pattern += get_stress(word)

    # Flatten the stress_pattern list
    stress_pattern = [item for sublist in stress_pattern for item in sublist]

    print(lyrics)
    print(tokens)
    print(["Here are the stress patterns:"] + stress_pattern)

    # Generate a melody based on the stress pattern
    track = Track()
    b = Bar()
    b.set_meter((4, 4))
    beats_in_current_bar = 0
    for stress in stress_pattern:
        if stress == 0:
            note = Note('C', 4)
        elif stress == 1:
            note = Note('E', 4)
        elif stress == 2:
            note = Note('G', 4)
        b + note
        beats_in_current_bar += 1
        if beats_in_current_bar == 4:
            track.add_bar(b)
            b = Bar()
            b.set_meter((4, 4))
            beats_in_current_bar = 0
    track.add_bar(b)


    return track
Enter fullscreen mode Exit fullscreen mode

Here, we define a function generate_melody that takes a string of lyrics as input and generates a melody based on the phonetic stress pattern of the words. The function uses the nltk library to tokenize the lyrics, handles contractions, and determines the stress pattern of each word using the CMU Pronouncing Dictionary. We account for special cases like contractions, hyphenated words, and words not found in the dictionary, attempting to find similar words or applying a default stress pattern when necessary.

After extracting and flattening the stress pattern of the entire lyrics, the function uses this pattern to create a melody where different stress levels are mapped to specific musical notes (C, E, G) in a 4/4 time signature, adding these notes to a musical track using the mingus library. Each stress level in the pattern corresponds to a different note, and the function organizes these notes into bars, with each bar containing up to four beats. The track, comprising a series of bars filled with notes based on the lyrical stress pattern, is returned at the end of the function. This allows for the conversion of lyrical content into a basic musical representation, integrating elements of natural language processing and music composition.

We can then look into creating the MIDI file and sheet music:

formatted_lyrics = format_lyrics(gpt_lyrics)
formatted_lyrics_with_newlines = add_newlines(formatted_lyrics)
print("Here are the lyrics:" + formatted_lyrics_with_newlines)

melody = generate_melody(formatted_lyrics_with_newlines)

lilypond_string = lilypond.from_Track(melody)
with open('melody.ly', 'w') as f:
    f.write(lilypond_string)

print(melody)
Enter fullscreen mode Exit fullscreen mode

As can be seen, we output the lyrics and MIDI.

Outputting Lyrics as PDF
Finally, we generate a PDF document containing the formatted lyrics, useful for singers or for archival purposes.

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font('Arial', 'B', 16)
title = formatted_lyrics_with_newlines.split('\n')[0]
pdf.cell(0, 10, title, 0, 1, 'C')
pdf.set_font('Arial', '', 12)
pdf.multi_cell(0, 10, formatted_lyrics_with_newlines)
pdf.output('lyrics.pdf')
Enter fullscreen mode Exit fullscreen mode

Now, we have our fully functioning Python code for making music! Now, we can decide where to deploy this. Leveraging the capability of AWS, I am going to deploy it on an EC2.

Setting Up the EC2 Instance

  1. Go to the EC2 dashboard in AWS Management Console.
  2. Click on "Launch Instance".
  3. Choose an Amazon Machine Image (AMI), such as Amazon Linux 2 AMI or Ubuntu Server.
  4. Select an instance type (e.g., t2.micro for testing purposes).
  5. Configure instance details as required.
  6. Add storage if the default isn’t enough.
  7. Configure Security Group to allow SSH (port 22) and any other necessary ports (e.g., port 80 for web server).
  8. Review and launch the instance by selecting or creating a new key pair.

Now that we have the EC2 instance ready, we can access it.

Accessing the EC2 Instance
Connect to your instance using SSH. For Windows, you can use PuTTY or any SSH client:
ssh -i /path/to/your-key.pem ec2-user@your-instance-ip

Environment Setup
Install Python and other necessary tools.

sudo yum update -y
sudo yum install python3 git -y
Enter fullscreen mode Exit fullscreen mode

Now move the code over to EC2 with SCP or SFTP and install the relevant Python libraries. You can also clone the repo provided at the end of this post.
pip3 install pandas nltk openai pronouncing re fpdf fasttext

You also will need to install lilypond for usage with Yum.
sudo yum install lilypond -y

Set environment variables (for example, for the OpenAI API key):
export openai_api_key='your-api-key'

With the environment now set up, we can run and automate the script.

Running and Automating the Script
Run the script with the command below.
python3 main.py

To ensure the script runs automatically at system boot or keeps running after disconnections, we will use nohup to run the script in the background.
nohup python3 your_script.py &

Set up a systemd service if you want it to start at boot and restart on failures.

We can then monitor the script execution through checking the logs.
tail -f nohup.out

Optionally, configure CloudWatch to monitor the EC2 performance or to set up more advanced logging and alerting.

And with that, we've set up our deployed solution! It takes a bit of learning to wrap your head around the NLP and music theory bits, but it really is quite a straightforward approach to getting started on making music with AI.

Conclusion
This project showcases the power of AI in creative processes like songwriting. By integrating various technologies and libraries, we've created a system that not only generates lyrics but also composes a melody, presenting it in both audio and visual forms. This illustrates the potential for AI to assist in artistic expressions, providing tools that can inspire and enhance the creative capabilities of its users. Whether you're a developer, a musician, or an enthusiast in the realms of AI or music, this project offers fascinating insights and possibilities.

The repo to the project can be accessed at: https://github.com/renaldig/melodistiq-music-generator

Top comments (0)