Insight Lighthouse

Posted on Nov 4

Conceptual Framework for Token Stream Analysis using Dual-level Exponential Rolling Average Centroids

#machinelearning #ai #algorithms #datascience

Disclaimer: This article was created using Aider and refined iteratively with Git diffing in VSCode. While I worked to shape my ideas, it may have lost some structure late the night before the workweek. My goal is to share raw thoughts, even if imperfect, with the hope of improving next time.

Overview

This conceptual framework proposes a sophisticated token stream analysis approach using both global and type-specific ERACs to track complex interactions between pursuer and evader points.

Exponential Rolling Average Centroids (ERACs)

An ERAC is the time-weighted average position of a group of points, where recent points have more influence than older points. This moving center point provides a smooth, continuous representation of where a group of points tends to cluster over time. The system tracks multiple ERACs at both global and type-specific levels to enable pattern detection and guide point movement.

ERAC Behavior

Each new token updates both global and type-specific ERACs simultaneously
ERACs provide smoothed, time-weighted views of point distributions
Recent points have more influence than older points in determining ERAC positions
Separate ERACs enable both global pattern detection and type-specific behavior tracking

Point Types

Each token instance in the stream generates two kinds of points that participate in pattern detection:

Evader Points - Points that move away from pursuer ERACs
Pursuer Points - Points that move toward evader ERACs

Spatial Framework

All points exist on the surface of a unit sphere (radius = 1)
Each point's movement is calculated independently based on its distance to relevant ERACs
Points can be processed independently with no dependencies between points, supporting full parallelization
After each linear shift toward/away from ERACs, points are projected back onto the sphere's surface

Core Concepts

Signal Processing and Scoring

The system maintains two independent global source signals:
- Global desirability: The primary source of positive feedback
- Global undesirability: The primary source of negative feedback
- These global signals accumulate additively over time as feedback occurs
Each token type maintains exactly two independent cumulative scores:
- A desirability score that accumulates positive signals
- An undesirability score that accumulates negative signals
- These scores are completely separate and never interact
Signal Propagation Flow (for desirability - undesirability follows same path separately):
- Propagation occurs every time a token instance is processed:
- When processing any token:
  - Additively updates its type's cumulative desirability score based on current global score
  - Uses points to bridge to next type:
  - From token's type's pursuer point
  - To nearest evader point of a different type
  - That evader point's type receives the propagated score
  - Score propagates to the next type multiplied by a factor between 0 and 1
  - This scaled score is added to the receiving type's cumulative score
- On fixed time intervals only:
- The global source signal is halved before propagation
- Propagation then proceeds exactly as normal
- This periodic halving creates a half-life decay effect
- But the propagation mechanism remains purely additive

Token Categories

This framework defines two distinct categories of tokens. Both categories participate in core system dynamics like pattern detection and signal propagation, though primitive tokens have additional capabilities for direct interaction with external sources:

Primitive Tokens
- Basic units of input streamed from external data sources
- Tokens with identical data are automatically recognized as instances of the same type
- Can be ingested independently, supporting full parallelization
- Each token type maintains its own evader and pursuer points for pattern processing
- Represent the most basic units that can participate in patterns within the input stream
- Examples include individual characters or basic events
- Each primitive token type has two boolean properties:
  - Suppressible: Whether the source can be instructed to prevent the token from occurring
  - Triggerable: Whether the source can be instructed to generate the token on demand
- These properties enable real-time feedback to token sources:
  - When processing a token instance, system immediately checks its token type's evader point's nearest neighbor pursuer point of a different type
  - If that neighbor type has significant undesirability, system rapidly sends suppression signal to prevent its likely occurrence
  - Conversely, if neighbor type has significant desirability, system can send trigger signal to directly cause its occurrence
  - This feedback mechanism allows active shaping of the token stream based on learned desirability/undesirability patterns
Higher-Order Tokens
- Generated automatically and in parallel when patterns are detected between tokens
- Can be processed independently through a dedicated ingestion interface, supporting full parallelization
- Formation process has no inherent sequential dependencies, enabling parallel execution:
  - When processing a primitive token type's evader point, system immediately checks for nearest neighbor pursuer points
  - If nearest pursuer is from a different token type, a higher-order token instance is automatically created
  - Timing corresponds to when the evader point's token occurs
- Each higher-order token is processed identically to primitive tokens:
  - Has its own evader and pursuer points at the type level
  - Points follow same movement rules relative to all ERACs
  - Participates in same parallel pattern detection process
- This enables recursive pattern building while maintaining full parallelism

Dual-level ERAC System

The system maintains two distinct levels of ERACs that update continuously:

Global ERACs:
- Global Evader ERAC: Aggregates all evader points across all types
- Global Pursuer ERAC: Aggregates all pursuer points across all types
- Provides overall pattern guidance
Type-specific ERACs:
- Per-type Evader ERAC: Tracks evader points for each token type
- Per-type Pursuer ERAC: Tracks pursuer points for each token type
- Enables type-specific behavior

This dual-level ERAC system enables complex pattern detection while preventing unwanted point clustering through continuous real-time updates of both global and type-specific centroids.

Dynamic Movement Patterns

Combined Point Movement

Each point is processed independently and in parallel, experiencing two distinct shifts in every update:

Type-specific Movement
- Each point's movement is calculated independently
- Both evader and pursuer points shift away from their type's ERAC
- Shift magnitudes are inversely proportional to distances to type's ERAC - closer points experience larger shifts
- This inverse distance relationship naturally causes points of the same type to disperse from each other
- This emergent dispersal behavior arises automatically, without requiring explicit coordination
- Creates natural spacing between points of the same type
Global Movement
- Pursuer points shift toward the global evader ERAC
- Evader points shift away from the global pursuer ERAC
- Shift magnitude depends on direct distance to the other type's global ERAC
- Creates pursuit-evasion dynamics between different token types

The final movement of each point combines both shifts vectorially:

The two shift vectors are added to determine total movement
Points maintain both type-specific spacing and global pursuit-evasion
After combined movement, points are projected back onto the unit sphere
When pursuers closely co-locate with evaders of different types, it indicates a pattern

Conceptual Processing Flow

The framework envisions fully parallel token processing with no sequential dependencies:

Token Processing:
- Token instances arrive from external data sources
- Each token instance can be processed independently, supporting full parallelization
- Token instances of the same type share identical data characteristics and reference the same type-level points
- No strict ordering is required - operations support full parallelization
Point Movement and ERAC Updates:
- Points move and ERACs update independently, supporting full parallelization
- No strict ordering between ERAC updates and point movements
- ERACs and point positions converge naturally through updates
- Points respond to current ERAC values at time of processing
Pattern Formation:
- Within each token type, points naturally disperse based on proximity
- Stronger dispersion occurs when same-type points are closely co-located
- Once sufficiently separated, global pursuit-evasion dynamics dominate

Pattern Detection and Signal Propagation

Fundamental Pattern Detection Principles

Co-location of pursuer and evader points indicates a directional pattern between token types
Pattern detection occurs when one token type consistently precedes another
The strength of co-location indicates the strength of directional relationships
These detected patterns form the building blocks for recognizing more complex token relationships

While the basic desirability/undesirability signal propagation serves as the primary learning mechanism, the system's ability to trigger and suppress specific token instances enables a foundation for scientific-like exploration:

Controlled Experimentation with Token Instances:
- System can deliberately suppress individual token instances even when their type is not highly undesirable
- System can trigger individual token instances even when their type is not highly desirable
- These instance-level interventions enable active experimentation with token patterns
Advanced Signal Tracking:
- For scientific exploration, each token type maintains four additional independent scores:
- Suppressed Desirability: Accumulates positive feedback during suppression experiments
- Suppressed Undesirability: Accumulates negative feedback during suppression experiments
- Triggered Desirability: Accumulates positive feedback during triggering experiments
- Triggered Undesirability: Accumulates negative feedback during triggering experiments
- These experimental scores use the same additive signal propagation algorithm as the main scores
- But they only accumulate during their respective experimental conditions
- Kept completely separate from the main desirability/undesirability scores
- Enables isolated analysis of intervention effects
Outcome Analysis:
- Effects of instance suppression/triggering can be measured through the separate experimental scores
- System can learn which specific interventions lead to more desirable outcomes
- Provides isolated feedback loop for refining experimental strategies

This framework lays groundwork for developing more sophisticated experimental approaches:

Systematic testing of token pattern hypotheses through instance-level control
Discovery of beneficial token combinations via targeted triggering
Learning optimal intervention timing for specific instances
The dual-signal system serves as the primary training mechanism:
- Provides feedback about beneficial and harmful state transitions
- Enables learning through reinforcement of desirable patterns
- Maintains separate positive and negative feedback channels

The experimental capabilities described above provide essential building blocks for advanced pattern learning and optimization while maintaining separation from the core learning mechanism.

DEV Community