DEV Community

adrian
adrian

Posted on • Edited on

Haunted data pipeline

Suppose in the spirit of haloween, you want to add some haunting to your DWH. How do you do that?

  1. Create a function that returns some strange messages
  2. Call it from your pipelines to log the strangeness for lols
  3. Watch the logs get haunted :)

Sample code below. Happy haunting!

import requests
from bs4 import BeautifulSoup

import random


# Function to fetch spooky quotes from Goodreads
def fetch_spooky_quotes_goodreads():
    url = "https://www.goodreads.com/quotes/tag/spooky"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    quotes = soup.find_all('div', class_='quoteText')

    for quote in quotes:
        yield(quote.text)



def generate_dummy_data(num_rows=1000000):
    for i in range(1, num_rows + 1):
        data = {
            'ID': i,
            'Name': f'Name_{i}',
            'Age': random.randint(18, 70),
            'Gender': random.choice(['Male', 'Female']),
            'City': random.choice(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']),
            'Score': round(random.uniform(0, 100), 2)
        }
        if random.random() < 0.01: # 1% chance to print
            print(next(fetch_spooky_quotes_goodreads()))
        yield data



# view data
for row in generate_dummy_data():
    print(row)

# open connection
pipeline = dlt.pipeline(
    destination='duckdb',
    dataset_name='raw_data'
)

# Upsert/merge: Update old records, insert new
load_info = pipeline.run(
    data,
    write_disposition="merge",
    primary_key="id",
    table_name="users"
)

Enter fullscreen mode Exit fullscreen mode

Top comments (0)