DEV Community

Cover image for 🚀 How to manage the Open Data in your project / Release package manager for open data📦
ryo-ma
ryo-ma

Posted on

🚀 How to manage the Open Data in your project / Release package manager for open data📦

※ Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.(https://en.wikipedia.org/wiki/Open_data)

For example


We have released the CLI tool that is dim (Open Data Package Manager) to manage open data.

GitHub logo c-3lab / dim

📦 dim: Manage the open data in your project like a package manager.

dim

codecov Github All Releases Github All Releases

Data Installation Manager: Manage the open data in your project like a package manager.

8bket-vzuiv

Join community

We are looking for members to develop together as an open source community.

Slack

Features

Document

For more information about how to use it, please refer to this document.

Quick Start

Install the dim

Install the dim from binary files or Run the dim using Deno

Install the dim from binary files

Download the dim from binary files.

aarch64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-pc-windows-msvc

curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe

Demo

preview1

I thought open data should be managed by a package manager just like the software (ex: npm, apt, pip, gem...).

When fetching the open data, it would be convenient for users to be able to fetch them with commands like:

npm install xxxxx
Enter fullscreen mode Exit fullscreen mode

After data is installed, it is recorded in a dim.json such as package.json

Stop chaotic open data management

A systematic method of managing software and libraries has been established by package managers(npm, gem, apt...). However, there is no systematic management approach for open data users.

If you were given the assignment to visualize a map using some kind of open data, how would you prepare the data?

The following flow is a common example.

  1. Search for open data you want from Google
  2. When you find the open data you want, download it from your browser
  3. Check the open data and return to 1 if the open data is incomplete or not what you wanted
  4. Processing the open data for utilization (character encoding conversion, file format conversion...)
  5. Save the open data in the project directory or database

This process is sufficient for simple projects to utilize.
However, you may want to record the specs(name, URL, last-updated, etc...) of open data.

  • Project developed by multiple people
  • Projects to be maintained in the medium to long term
  • Public projects (published on GitHub as OSS, etc.) , etc.

List of required the open data specifications

If you download the open data from various sites and process datasets, you may forget where you downloaded the open data from or how you processed the data. Therefore, it is useful to record the following specifications.

  • URL
  • Last-updated
  • Version
  • Post-processing
  • Hash value , etc.

Approach

We have released a CLI tool the dim (Open Data Package Manager) v1.0.

https://github.com/c-3lab/dim

preview2

Feature

(1) Support for search/download/processing/recording processes

The dim support search/download/processing/recording processes. The dim can also execute a series of processes by interactive commands.

feature1

(2) Support for post-processing commonly used in the data processing

The dim includes several post-processes commonly used in data processing. The post-process is recorded as well as the data URL. You can also use your scripts as post-process.

Image description

(3) Prepare data in one step using the existing data specification file

You can fetch and process all open data in one step by using a data specification file(dim.json) that has already been recorded.
As a user, you only share a data specification file(dim.json) without including the open data body in the repository by publishing the data specification file on GitHub.
(This is the same as publishing package.json etc. to GitHub)

Image description

About the development environment

  • Language: TypeScript
  • Execution environment: Deno
  • CI/CD: GitHub Actions
    • CI: Test/Lint/Type Check/Coverage
    • CD: Automatically publish a release by tagging, building dim binary & upload

We are using Deno, which is expected to replace Node.js. We evaluated Deno for the following reasons.

  • simple to set up and easy to start projects
  • Lint and formatter are provided as standard functions
  • TypeScript can be executed as is etc.

Usage dim

Install dim

Download the dim from binary files.

aarch64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim
Enter fullscreen mode Exit fullscreen mode

x86_64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim
Enter fullscreen mode Exit fullscreen mode

x86_64-pc-windows-msvc

curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe
Enter fullscreen mode Exit fullscreen mode

x86_64-unknown-linux-gnu

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-unknown-linux-gnu-dim -o /usr/local/bin/dim
Enter fullscreen mode Exit fullscreen mode

Grant user execution permission

chmod u+x /usr/local/bin/dim
Enter fullscreen mode Exit fullscreen mode

New Project

init the project

Generate dim.json, dim-lock.json, and data_files/ by the init command.

$ dim init
Enter fullscreen mode Exit fullscreen mode
Install a data

This command stores information about installed data in dim.json and dim-lock.json.

$ dim install https://example.com -n "example"
Enter fullscreen mode Exit fullscreen mode
Installed data is saved in data_files/.
$ ls ./data_files
Enter fullscreen mode Exit fullscreen mode

Install all data written to dim.json shared by members

Preview2

Install all data written to dim.json shared by members.

Make sure existing the dim.json in the current directory
$ ls ./

dim.json  ....
Enter fullscreen mode Exit fullscreen mode
Install all data written in the dim.json
$ dim install
Enter fullscreen mode Exit fullscreen mode
Installed data is saved in data_files/.
$ ls ./data_files
Enter fullscreen mode Exit fullscreen mode

The dim has many other features required for package manager in addition to these functions.
https://github.com/c-3lab/dim#command-usage

For an example of functions

etc.


We have released version v1.0 of the open data package manager dim, which manages the open data like a package manager.

There are still a lot of features We want to add. If there is someone who can sympathize with the issues and solve the issue together, we would be very welcome.

GitHub logo c-3lab / dim

📦 dim: Manage the open data in your project like a package manager.

dim

codecov Github All Releases Github All Releases

Data Installation Manager: Manage the open data in your project like a package manager.

8bket-vzuiv

Join community

We are looking for members to develop together as an open source community.

Slack

Features

Document

For more information about how to use it, please refer to this document.

Quick Start

Install the dim

Install the dim from binary files or Run the dim using Deno

Install the dim from binary files

Download the dim from binary files.

aarch64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-pc-windows-msvc

curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe

Top comments (0)