Blacknight318

Posted on May 3 • Originally published at saltyoldgeek.com

AI-Powered News Summaries for Slack

#python #openai #slack

Yet another AI script, is it useful?

If you're like me you probably have, or have had, an RSS feed reader to at least try and keep up with news and blogs on the latest in tech among others. This project started as a way for me to get a bit more comprehensive summary and have them sent to my Slack chat as they happened. After trying several on-device text-to-speech(TTS) engines I was frustrated, they all had either incomplete sentences or missing punctuation or both, not at all usable. Since OpenAI released CustomGPT and Assistant AI API I decided to try that. The prompt for this is simple "Please summarize the tech articles to give a complete, and brief, summary". That's it. Here's the script I put together and have been tweaking over the last couple of weeks, broken down into chunks.

Main Script Chunks

Imports

import json
import openai
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
import feedparser # type: ignore
from newspaper import Article # type: ignore
import sqlite3
from datetime import datetime
import time
from typing import Dict, Any, NamedTuple, cast

Config load function

def load_config() -> Dict[str, Any]:
    with open('config.json', 'r') as file:
        return json.load(file)

Sqlite3 database creation

You could use any sql database, for simplicity sake we are using sqlite3 here.

def create_database() -> None:
    conn = sqlite3.connect('articles.db')
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS articles
                (link TEXT PRIMARY KEY, title TEXT, summary TEXT)''')
    conn.commit()
    conn.close()

Checking to see if the article has already been summarized

This beehaves similar to caching, in thatt we don't want to pay to summarize the same article over and over, so we check the URL to see if it's already been summarized.

def is_article_summarized(link: str) -> bool:
    conn = sqlite3.connect('articles.db')
    c = conn.cursor()
    c.execute("SELECT * FROM articles WHERE link = ?", (link,))
    result = c.fetchone()
    conn.close()
    return result

Save summary, link, and title to db

def is_article_summarized(link: str) -> bool:
    conn = sqlite3.connect('articles.db')
    c = conn.cursor()
    c.execute("SELECT * FROM articles WHERE link = ?", (link,))
    result = c.fetchone()
    conn.close()
    return result

Create an OpenAI Thread for each summary

Following along with OpenAI's documentation, we are creating a new thread for each article.

def create_thread(ass_id: str, prompt: str) -> tuple[str, str]:
    thread = openai.beta.threads.create()
    my_thread_id = thread.id

    openai.beta.threads.messages.create(
        thread_id=my_thread_id,
        role="user",
        content=prompt
    )

    run = openai.beta.threads.runs.create(
        thread_id=my_thread_id,
        assistant_id=ass_id,
    )

    return run.id, my_thread_id

Check thread status

Here we are periodically checking in to see if the summary is finished, using a 2 second delay to avoid spamming our assistant.

def check_status(run_id: str, thread_id: str) -> str:
    run = openai.beta.threads.runs.retrieve(
        thread_id=thread_id,
        run_id=run_id,
    )
    return run.status

Send to Slack

def send_message_to_slack(title: str, link: str, summary: str) -> None:
    try:
        message = f"New Article: *<{link}|{title}>*\nSummary: {summary}"
        client.chat_postMessage(channel='#news', text=message)
    except SlackApiError as e:
    print(f"Error sending message: {e.response['error']}")

Heavy lifting function to coordinate most of the rest

This is really where most of the processing takes place, rather then the main function, as most of the processing is dependent on previous steps. This may change in the future.

def fetch_articles_from_rss(rss_url: str) -> None:
    feed = feedparser.parse(rss_url)
    for entry in feed.entries:
    if not is_article_summarized(entry.link):
        article = Article(entry.link)
        article.download()
        article.parse()

    # Truncate the article text if it exceeds the limit
        max_length = 32768 - len(entry.title) - len("Please summarize this article:\n\nTitle: \n\n")
        article_text = article.text[:max_length] if len(article.text) > max_length else article.text

    prompt = f"Please summarize this article:\n\nTitle: {entry.title}\n\n{article_text}"
    run_id, thread_id = create_thread(assistant_id, prompt)

    status = check_status(run_id, thread_id)
    while status != "completed":
        status = check_status(run_id, thread_id)
        time.sleep(2)

    response = openai.beta.threads.messages.list(thread_id=thread_id)
    if response.data:
        content = cast(Any, response.data[0].content[0])
        summary = content.text.value
        # summary = response.data[0].content[0].text.value
        # Send the article details to Slack
        send_message_to_slack(entry.title, entry.link, summary)
        save_summary(entry.link, entry.title, summary)

    time.sleep(20)

Main Function

Realy all we wanted to happen here is setup/check for a database and setup the initial loop with some debugging print statements, which will be changed to logging in the future.

def main() -> None:
    create_database()
    while True:
        now = datetime.now()
        print(f'Punch in at {now}')
        for rss_url in config['rss_urls']:
            fetch_articles_from_rss(rss_url)
        now = datetime.now()
        print(f'Punch out at {now}')
    time.sleep(900)

Script body launcher

if __name__ == "__main__":
    config = load_config()

    # Set the API keys from the configuration
    openai.api_key = config['openai_key']
    assistant_id = config['assistant_id']

    client = WebClient(token=config['slack_token'])

    main()

config.json

You'll need 3 things for the config to make this work, the assistant ID (found on the assistant page), the OpenAI API Key, and a Slack bot/app Token.

{
    "openai_key": "sk-open-ai-key-here",
    "slack_token": "xoxb-slack-app-token",
    "assistant_id": "asst_assistatn-id",
    "rss_urls": [
        "https://www.bleepingcomputer.com/feed/",
        "https://feeds.arstechnica.com/arstechnica/index",
        "https://www.wired.com/feed/tag/ai/latest/rss",
        "https://www.wired.com/feed/category/ideas/latest/rss",
        "https://www.wired.com/feed/category/science/latest/rss",
        "https://www.wired.com/feed/category/security/latest/rss",
        "https://www.wired.com/feed/category/backchannel/latest/rss",
        "https://www.wired.com/feed/tag/wired-guide/latest/rss",
        "https://www.cisa.gov/news.xml",
        "https://www.cisa.gov/cisa/blog.xml",
        "https://www.cisa.gov/cybersecurity-advisories/all.xml",
        "https://googleonlinesecurity.blogspot.com/atom.xml"
    ]
}

Trying it out

If you'd like to try this out follow the commands below(Linux and Mac), be sure to edit the config.json file.

git clone https://github.com/Blacknight318/openai_rss_summarizer
cd openai_rss_summarizer
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements
cp sample_config.json config.json
nano config.json # Edit and press ztrl+x to save
nohup python main.py&

Todo

Create link transformer for things link Cloudflares blog
Create Streamlit webui for recall and search of old articles(separate file)
Cdd functionality to search with @botname command
Independent backend db scheme
Python file to create Openai assistant from scratch

Closing the loop

This is still an ongoing project, if you'd like to keep up with the latest check out the Github repo. Till next time fair winds and following seas.

DEV Community