Why relational DAG
- Checks on data inegrity means you can pass your project to the maintainers
- Lower barrier to try out ideas on old data
Ability to try new features without breaking existing workflow
No-code parallization with linear scaling (up to n channels sppedup for preprocessing)
Workflow graphics
No-extra-work unit testing (in theory)
No extra work for usage documentation. This frees up devs to document features/science
Engineering process pointers
- Write two schemas at a time, one you're thinking about now and its child
- First, populate a few recording sessions, not all datasets
- Once the current schema and its child seem to work fine, get them populating and move on the the next node.
- Work on whole analysis simultaneously - no need to wait for all step to finish.
- Can start next as soon as a few keys have populated.
- My data entry stage is during recording when I make the filename
- Keep Datetime as primary key for restrictions on populate calls (Not so sure anymore)
- If you work on children and parents simultaneously, parents may not show up for a while. Wait until mysql processes them first.
Organizational principles
- Separate phases of analysis: ingestion/condensing, chunking, computation, organization, plotting
baby steps at the top, cram many at the bottom - bottom is high overhead due to multiplication of keys (eg events), whereas dropping top nodes means recalculate all children
For populate calls, restrict by list of sessions if you suspect that some are not populating due to errors
if not all items are showing up at the end of populate, repeat with reserve=False
-
start with reserve=False for schemas that require little cpu
Gotchas
Fractional primary keys must be stored as decimal (10,30)
Top comments (0)