DEV Community

Cover image for Transform UI Screenshots into HTML & CSS with Qwen Coder and Qwen VL
Aryan Kargwal for Tune AI

Posted on

Transform UI Screenshots into HTML & CSS with Qwen Coder and Qwen VL

🎥Youtube Video Link: Click Me

Imagine this: you’re working on a website redesign, and you’ve just captured a UI screenshot that embodies the look you want. Wouldn’t it be incredible if you could automatically turn that image into HTML and CSS? This tutorial will show you exactly how to make that happen, transforming visual designs into code using cutting-edge vision-language models (VLMs) and Qwen Coder.

In this setup, we’ll build a pipeline where an AI model analyzes your UI design image, understands its layout, colors, typography, and structure, and then generates clean, organized HTML and CSS code. This process opens up a world of possibilities for UI prototyping, automated design-to-code workflows, and quick mockup generation.

Some cool points we'll cover:

  • Upload and Describe UI Designs: How we upload a UI screenshot and get a detailed breakdown of the design elements.
  • Generate HTML & CSS with AI: Transforming these descriptions into fully functional HTML and CSS code for quick web design prototyping.

Let’s get started!

Step 1: Setting Up API Details and Image Encoding

First, let’s configure the API endpoint, headers, and a helper function to encode images into Base64. This encoding step allows us to send the image data to the model.

import json
import requests
import base64
from PIL import Image
from io import BytesIO

# Set API details
url = "https://proxy.tune.app/chat/completions"
headers = {
    "Authorization": "YOUR_API_KEY",  # Replace with your actual API key
    "Content-Type": "application/json",
}

# Encode image in Base64 format
def encode_image(image):
    if image.mode == 'RGBA':
        image = image.convert('RGB')  # Convert RGBA to RGB
    buffered = BytesIO()
    image.save(buffered, format="JPEG")
    return base64.b64encode(buffered.getvalue()).decode('utf-8')
Enter fullscreen mode Exit fullscreen mode

Step 2: Querying the Vision-Language Model for Description

In this step, we’ll create a function that queries the VLM to analyze the UI image and provide a detailed description. This model captures all aspects of the UI, including color schemes, typography, layout structures, and icons, which are essential for accurately generating HTML and CSS.

# Query the model for a description of the image
def query_model(base64_image, prompt, model_id, max_tokens=1000, temperature=0.9):
    image_content = {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
        }
    }

    data = {
        "model": model_id,
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    image_content
                ]
            }
        ],
        "max_tokens": max_tokens,
    }

    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
        answer = response.json().get('choices', [{}])[0].get('message', {}).get('content', "")
        return answer.strip()
    else:
        return f"Error: {response.status_code} - {response.text}"
Enter fullscreen mode Exit fullscreen mode

Step 3: Extracting HTML and CSS Code from Model Response

Once we have the description, we prompt Qwen Coder to generate HTML and CSS based on the UI layout. Our code will parse the response, extracting any HTML and CSS content for easy file output.

import re

# Extract HTML and CSS from model response
def extract_html_css(response_text):
    html_match = re.search(r"### HTML\n```

html\n(.*?)

```", response_text, re.DOTALL)
    css_match = re.search(r"### CSS.*\n```

css\n(.*?)

```", response_text, re.DOTALL)

    html_code = html_match.group(1).strip() if html_match else ""
    css_code = css_match.group(1).strip() if css_match else ""

    return html_code, css_code

# Save HTML and CSS to files
def write_files(html_code, css_code):
    with open("index.html", "w") as html_file:
        html_file.write(html_code)
    with open("styles.css", "w") as css_file:
        css_file.write(css_code)
Enter fullscreen mode Exit fullscreen mode

Step 4: Building the Streamlit App for User Interaction

Our final step is setting up the Streamlit interface. This UI allows users to upload images, choose a model, generate descriptions, and output HTML/CSS.

import streamlit as st

# Streamlit UI setup
st.title("Image Description and HTML/CSS Generation")
model_choice = st.selectbox("Select Model for Image Understanding", 
                            options=["qwen/qwen-2-vl-72b", "openai/gpt-4o", "mistral/pixtral-12B-2409", "meta/llama-3.2-90b-vision"],
                            index=0)
uploaded_image = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

if st.button("Generate Description"):
    if uploaded_image:
        image = Image.open(uploaded_image)
        base64_image = encode_image(image)
        st.image(image)

        # Generate the UI description
        description_prompt = "Please analyze this software interface image and provide a detailed description."
        description = query_model(base64_image, description_prompt, model_id=model_choice)
        st.subheader("Generated Description:")
        st.markdown(description)

        if description:
            # Generate HTML and CSS
            html_css_data = {
                "temperature": 0.9,
                "messages": [
                    {"role": "system", "content": "You are TuneStudio, a coding assistant that generates HTML and CSS based on descriptions."},
                    {"role": "user", "content": f"Please create HTML and CSS based on the following detailed description: {description}"}
                ],
                "model": "qwen/qwen-2.5-coder-32b",
                "max_tokens": 3000
            }

            response = requests.post(url, headers=headers, json=html_css_data)
            if response.status_code == 200:
                html_css_code = response.json().get('choices', [{}])[0].get('message', {}).get('content', '')
                html_code, css_code = extract_html_css(html_css_code)

                if html_code and css_code:
                    write_files(html_code, css_code)
                    st.success("HTML and CSS files have been generated.")
                else:
                    st.error("HTML/CSS extraction failed.")

                st.subheader("Generated HTML and CSS:")
                st.code(html_css_code, language="html")
            else:
                st.error("Error generating HTML/CSS.")
    else:
        st.warning("Please upload an image.")
Enter fullscreen mode Exit fullscreen mode

Conclusion

With this setup, you’ve created a pipeline that not only automates the analysis of UI images but also translates them into HTML and CSS. This workflow is a major time-saver for developers, designers, and anyone involved in UI design. Now, you can turn visual ideas into functional code with the power of AI!

Let me know if you run into any questions or issues in the comments below.

Top comments (0)