DEV Community

Cover image for 🦙 📹 PinataShot: Multimodal LLaMA 3.2 Screenshot Categorization on Pinata IPFS
Jad Tounsi
Jad Tounsi

Posted on • Edited on

🦙 📹 PinataShot: Multimodal LLaMA 3.2 Screenshot Categorization on Pinata IPFS

PinataShot

Image description

What I Built

I built a SaaS Screenshot Organizer that helps users upload, categorize, and search through their screenshots with ease. The app leverages Pinata’s Files API for decentralized storage of images and integrates GROQ API's LLaMA 3.2 11B for AI-powered analysis of screenshots. With features like OCR (optical character recognition) for text extraction, automatic categorization, and a searchable screenshot gallery, this app streamlines organizing large collections of images and screenshots. It is deployed using Next.js on Vercel to ensure scalability and speed.

Limitations

One limitation in this version of the app is the restriction of processing one image at a time when using the GROQ API. This limitation stems from the current API constraints of GROQ's LLaMA 3.2 11B model, which can handle a single image per request for analysis. While this allows precise categorization and naming for each screenshot, it does limit bulk processing capabilities.

However, Pinata shines in this setup, as it seamlessly handles the decentralized storage of multiple images. Thanks to Pinata's robust and reliable IPFS-backed storage, users can upload several screenshots at once, which are securely stored and easily retrievable, even when waiting for their turn in the AI analysis queue.

Demo

Check out the Demo.
Below are a few key features of the app:

Image description

  • Upload Interface:

    Drag-and-drop feature for uploading screenshots with instant AI analysis.

  • Speak to images:

    You can ask questions about your screenshots, or images.

  • Screenshot Gallery:

    Screenshots are automatically named with what the AI will describe."

  • Text Search:

    Use the OCR feature to search through the text found in screenshots (e.g., receipts, documents).

My Code

Find the source code for the project on GitHub.

Tech Stack

  • Next.js: Frontend and backend (serverless API routes).
  • Pinata’s Files API: For decentralized file storage and retrieval on IPFS.
  • GROQ API’s LLaMA 3.2 11B: For vision capabilities and text extraction.
  • Vercel: Deployment platform ensuring scalability and speed.
  • Tailwind CSS: For styling and responsive UI.
  • Shadcn/ui & Aceternity UI: UI components library.

More Details

  • Pinata’s Files API is used to securely store screenshots and retrieve them from IPFS, ensuring decentralized storage and reliability. Pinata excels at handling multiple files, enabling users to store and access their screenshots quickly, even when dealing with large collections.

  • The AI analysis uses GROQ’s LLaMA 3.2 11B model to automatically categorize screenshots into appropriate names based on its' content, and extract text via OCR for easy search functionality. Although each image needs to be processed one at a time, Pinata’s decentralized storage makes this manageable by allowing users to upload many images at once, which can then be queued for AI processing.

This powerful combination of Pinata’s decentralized storage and GROQ’s AI capabilities makes this tool incredibly useful for a wide range of users—whether it’s for work, personal organization, or creative projects.

Future Improvements

  1. Bulk image processing: Overcoming the single image limitation by exploring options for batch image analysis.
  2. Advanced categorization algorithms.
  3. Enhanced search functionality using more refined OCR text extraction.
  4. User authentication and personal galleries.
  5. Real-time collaboration for sharing and organizing screenshots.

Running the Repository

To run this project locally, follow these steps:

# 1. Clone the repository
git clone https://github.com/yourusername/screenshot-organizer.git
cd screenshot-organizer

# 2. Install dependencies
npm install

# 3. Set up environment variables
# Create a .env.local file in the root directory and add:
PINATA_API_KEY=your_pinata_api_key
PINATA_SECRET_API_KEY=your_pinata_secret_key
GROQ_API_KEY=your_groq_api_key

# 4. Run the development server
npm run dev

# 5. Open your browser and navigate to
http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

devchallenge #pinatachallenge #webdev #api #decentralizedstorage #AIanalysis #moroccoaisolutions

Top comments (0)