How GatiFlow Intelligence Works
Understanding our data pipeline, scoring, and what reports mean for your decisions.
1. Data Sources
GatiFlow collects publicly available data from 12 sources every 6 hours:
GitHub (trending repos, developer profiles) · StackOverflow (technology tag trends) · HackerNews (top discussions, Who's Hiring) · Reddit (community sentiment) · Dev.to (developer articles) · arXiv (research papers in CS/AI) · PyPI & npm (package download trends) · Adzuna & Remotive (job market signals) · HuggingFace & Kaggle (AI/ML community activity)
All collection uses official APIs with rate limiting, exponential backoff, and robots.txt compliance. Sources that are temporarily unavailable are skipped gracefully — reports are generated from whatever sources responded.
2. Topics
What they are: Topics are technology keywords extracted from collected data. When a HackerNews post mentions "Kubernetes", or an arXiv paper is tagged "cs.AI", or an npm package relates to "React" — these become topic signals.
How extraction works: Each title, tag, and description is matched against a curated whitelist of 60+ technology keywords spanning AI/ML, infrastructure, programming languages, frameworks, and developer tools. Only recognized topics are included — unrecognized content is dropped rather than guessed.
Your custom topics: In Settings, you can configure Topics of Interest for your organization (e.g., "python, kubernetes, ai"). This filters your reports to show only signals matching your interests. Setting no topics shows everything.
More topics ≠ more signals detected. Topics only filter the global collection. The same data is collected regardless of your topic settings. More topics = wider filter = more signals pass through to your report. Fewer topics = more focused report.
3. Signals
What they are: A signal is a single data point from a source that indicates market activity. Examples:
• A trending GitHub repository with 500 stars this week → market trend signal
• A developer with 200 followers and 40 active repos → talent signal
• A job posting for "Senior Rust Engineer" on Adzuna → hiring signal
Signal categories: Market Trends (technology adoption), Talent Intelligence (emerging developers), and Hiring Signals (job market demand). Each category has separate limits per plan.
Cross-source correlation: When a topic appears in multiple sources simultaneously (e.g., "AI" trending on HackerNews AND arXiv AND PyPI), the signal is stronger. Our scoring system rewards cross-source convergence.
4. Confidence Score
What it means: Confidence is a number between 0 and 1 (displayed as 0–100%) that indicates how strong and reliable a signal is. It is NOT a prediction of future success — it measures current signal strength.
How it is calculated:
• Base score (0.45–0.55): Every detected signal starts here
• Engagement boost (+0.00–0.20): Based on actual metrics from the source — HackerNews points, Reddit upvotes, Dev.to reactions, GitHub stars, package downloads
• Source diversity boost (+0.00–0.15): Signals appearing in multiple independent sources score higher
• Logarithmic scaling: We use log-scale normalization so that going from 10 to 100 mentions matters more than going from 10,000 to 10,100. This prevents popular topics from drowning out emerging ones.
Interpreting confidence:
• 0.85+ — Strong signal: multiple sources, high engagement. Worth immediate attention.
• 0.65–0.84 — Moderate signal: clear presence but limited cross-source validation. Monitor closely.
• 0.45–0.64 — Weak signal: early or single-source. Could be noise or could be an emerging trend worth watching.
Important: Confidence scores are recalculated every 6 hours with fresh data. A topic that scores 0.70 today may score 0.85 next week if momentum continues, or drop to 0.50 if interest fades.
5. AI-Generated Narratives
Pro and Business plans include AI-generated executive narratives that synthesize signals into actionable prose. These narratives are:
• Generated by Anthropic Claude using data from our collectors plus targeted web search at publication time for verification
• Post-processed by a fact-checking pipeline that removes unverifiable numbers
• Market size claims and fabricated statistics are automatically replaced with qualitative language
• Each narrative clearly states how many sources were active in that collection cycle
Narratives should be read as analytical commentary, not as definitive market research. Always verify specific claims independently before making business decisions.
6. Weekly Products
Weekly Digest (Monday 9AM UTC): Email summarizing the week's key trends, compiled from the 7 daily collection narratives.
Deep Dive (Saturday morning): An investigative article (~1800 words) on the week's most impactful topic, generated with web search verification and fact-checking. Published permanently on the site and emailed to subscribers.
Both weekly products build on the accumulated daily data — the quality of weekly output directly reflects the quality of daily collection.
Questions?
If anything about our methodology is unclear, contact us at support@gatiflow.io. Transparency is a core principle — we are happy to explain any aspect of our data pipeline in detail.