DEV Community

Cover image for How to create custom nodes in ComfyUI
Dhanush Reddy
Dhanush Reddy

Posted on

How to create custom nodes in ComfyUI

What is ComfyUI?

ComfyUI is a powerful and flexible user interface for Stable Diffusion, allowing users to create complex image generation workflows through a node-based system. While ComfyUI comes with a variety of built-in nodes, its true strength lies in its extensibility. Custom nodes enable users to add new functionality, integrate external services, and tailor it to their specific needs.

An image showing the interface and working of ComfyUI

In this blog post, we will walk through the process of creating a custom node for image captioning using ComfyUI. This node will take an image as input and return a generated caption using an external API.

We will be using Google Gemini API for generating the caption of an image.

So here is the entire code which does the ImageCaptioning using Gemini API.

You can copy the following code into any file under the custom_nodes folder in ComfyUI, I have named mine as gemini-caption.py
Where to store the file

Complete code for Generating Image Captions



import numpy as np
from PIL import Image
import requests
import io
import base64

class ImageCaptioningNode:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
        }

    RETURN_TYPES = ("STRING",)
    FUNCTION = "caption_image"
    CATEGORY = "image"
    OUTPUT_NODE = True

    def caption_image(self, image, api_key):
        # Convert the image tensor to a PIL Image
        image = Image.fromarray(
            np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
        )

        # Convert the image to base64
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"
        payload = {
            "contents": [
                {
                    "parts": [
                        {
                            "text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
                        },
                        {"inline_data": {"mime_type": "image/png", "data": img_str}},
                    ]
                }
            ]
        }

        # Send the request to the Gemini API
        try:
            response = requests.post(api_url, json=payload)
            response.raise_for_status()
            caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
        except requests.exceptions.RequestException as e:
            caption = f"Error: Unable to generate caption. {str(e)}"

        print(caption)
        return (caption,)


NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}


Enter fullscreen mode Exit fullscreen mode

Here is how the node looks on the UI:

Custom ComfyUI Node

Let's go over it line by line, to get an understanding how do we go about creating a similar node for your use case. First of all whatever node you want to create, make it as a function, so you can call it just in the same way in ComfyUI, as I did here for my caption_image function.

Import the necessary libraries needed



import numpy as np
from PIL import Image
import requests
import io
import base64


Enter fullscreen mode Exit fullscreen mode

These lines import the necessary libraries for my Image Captioning node:

  • numpy for numerical operations
  • PIL (Python Imaging Library) for image processing
  • requests for making HTTP requests to Gemini API
  • io for handling byte streams
  • base64 for encoding the image

Defining the ClassName for your ComfyUI node



class ImageCaptioningNode:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
        }


Enter fullscreen mode Exit fullscreen mode

In my case, I have named it as ImageCaptioningNode as it does what is says.

The class method defines the input types for our node:

  • An "image" input of type "IMAGE"
  • An "api_key" input of type "STRING" with a default empty value, needed for sending API requests to Gemini API.


    RETURN_TYPES = ("STRING",)
    FUNCTION = "caption_image"
    CATEGORY = "image"
    OUTPUT_NODE = True


Enter fullscreen mode Exit fullscreen mode

These class variables define:

  • The return type (a string)
  • The main function to be called ("caption_image")
  • The category in which the node will appear in ComfyUI
  • That this node can be an output node


    def caption_image(self, image, api_key):
        # Convert the image tensor to a PIL Image
        image = Image.fromarray(
            np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
        )

        # Convert the image to base64
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"

        # Prepare the request payload
        payload = {
            "contents": [
                {
                    "parts": [
                        {
                            "text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
                        },
                        {"inline_data": {"mime_type": "image/png", "data": img_str}},
                    ]
                }
            ]
        }
        try:
            response = requests.post(api_url, json=payload)
            response.raise_for_status()
            caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
        except requests.exceptions.RequestException as e:
            caption = f"Error: Unable to generate caption. {str(e)}"

        print(caption)
        return (caption,)



Enter fullscreen mode Exit fullscreen mode

This is a standalone function which I have written that takes an Image as input, and sends it to Gemini API using the API key. The code is straightforward, we are just doing base64 encoding so image gets sent via API. We instruct Gemini to caption the image in detail using the prompt. The response from API is parsed, and printed in the console and returned as a tuple (required by ComfyUI).



NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}


Enter fullscreen mode Exit fullscreen mode

This dictionary maps the class name to the class itself, which is used by ComfyUI to register the custom node.

To conclude your article on creating a custom ComfyUI node, you can summarize the key points and provide some final thoughts. Here's a suggested conclusion:

Conclusion:

Creating custom nodes for ComfyUI opens up a world of possibilities for extending and enhancing your image generation workflows. In this article, we've walked through the process of building a custom image captioning node, demonstrating how to:

  1. Define input and output types
  2. Integrate with external APIs (in this case, the Gemini API for image captioning)

By following these steps, you can create your own custom nodes to add virtually any functionality you need to ComfyUI. Whether you're integrating new LLM models, adding specialized image processing techniques, or creating shortcuts for common tasks, custom nodes allow you to tailor ComfyUI to your specific requirements.

Remember that while we've focused on image captioning in this example, the same principles can be applied to create nodes for a wide variety of tasks. The key is to understand the structure of a ComfyUI node and how to interface with the expected inputs and outputs.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃

Top comments (6)

Collapse
 
sc0v0ne profile image
sc0v0ne

Congratulations on the post, you were very clear in the explanation. I have a question, what could ComfyUI add to create a workflow structure compared to Airflow?

One observation, if you accept a tip. Put the link to the tool for the reader. Help to find reference.

GitHub logo comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.

ComfyUI

The most powerful and modular stable diffusion GUI and backend.

ComfyUI Screenshot

This ui will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. For some workflow examples and see what ComfyUI can do you can check out:

Features

  • Nodes/graph/flowchart interface to experiment and create complex Stable Diffusion workflows without needing to code anything.
  • Fully supports SD1.x, SD2.x, SDXL, Stable Video Diffusion, Stable Cascade, SD3 and Stable Audio
  • Asynchronous Queue system
  • Many optimizations: Only re-executes the parts of the workflow that changes between executions.
  • Smart memory management: can automatically run models on GPUs with as low as 1GB vram.
  • Works even if you don't have a GPU with: --cpu (slow)
  • Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs and CLIP models.
  • Embeddings/Textual inversion
  • Loras (regular, locon and loha)
  • Hypernetworks
  • Loading full workflows (with seeds) from generated PNG, WebP…
Collapse
 
dhanushreddy29 profile image
Dhanush Reddy

Thanks @sc0v0ne, I have added repo link now.
I am not really sure about Apache Airflow, as I havent worked on it in the past

Collapse
 
axorax profile image
Axorax

great article!

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy

Thanks @axorax

Collapse
 
axorax profile image
Axorax

no problem!

Collapse
 
sourabh_gupta_4f535b73929 profile image
Sourabh Gupta

This is a fantastic guide to extending ComfyUI’s functionality with custom nodes! The detailed explanation of setting up inputs and outputs and working with external APIs like the Gemini API really clarifies the process. Custom nodes seem like a powerful way to adapt ComfyUI to specific needs, especially for unique workflows in image generation. Are there any other APIs you’d recommend for different types of image processing tasks, or would you suggest experimenting with any particular techniques when building custom nodes? Thanks for the detailed guide!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.