How GatiFlow Intelligence Works

Understanding our data pipeline, scoring, and what reports mean for your decisions.

1. Data Sources

GatiFlow collects publicly available data from 14 sources every 6 hours:

GitHub (trending repos, developer profiles) · StackOverflow (technology tag trends) · HackerNews (top discussions) · HackerNews Who's Hiring (monthly job thread) · Dev.to (developer articles) · arXiv (research papers in CS/AI) · OpenReview (peer-review records) · PyPI & npm (package download trends) · Adzuna & Remotive (job market signals) · Greenhouse & Lever (public job boards) · HuggingFace (AI/ML model and dataset metadata) · SEC EDGAR (entity-level regulatory filings)

All collection uses official APIs with rate limiting, exponential backoff, and robots.txt compliance. Sources that are temporarily unavailable are skipped gracefully — reports are generated from whatever sources responded.

2. Topics

What they are: Topics are technology keywords extracted from collected data. When a HackerNews post mentions "Kubernetes", or an arXiv paper is tagged "cs.AI", or an npm package relates to "React" — these become topic signals.

How extraction works: Each title, tag, and description is matched against a curated whitelist of 60+ technology keywords spanning AI/ML, infrastructure, programming languages, frameworks, and developer tools. Only recognized topics are included — unrecognized content is dropped rather than guessed.

Your custom topics: In Settings, you can configure Topics of Interest for your organization (e.g., "python, kubernetes, ai"). This filters your reports to show only signals matching your interests. Setting no topics shows everything.

More topics ≠ more signals detected. Topics only filter the global collection. The same data is collected regardless of your topic settings. More topics = wider filter = more signals pass through to your report. Fewer topics = more focused report.

3. Signals

What they are: A signal is a single data point from a source that indicates market activity. Examples:

• A trending GitHub repository with 500 stars this week → market trend signal

• A developer with 200 followers and 40 active repos → talent signal

• A job posting for "Senior Rust Engineer" on Adzuna → hiring signal

Signal categories: Market Trends (technology adoption), Talent Intelligence (emerging developers), and Hiring Signals (job market demand). Each category has separate limits per plan.

Cross-source correlation: When a topic appears in multiple sources simultaneously (e.g., "AI" trending on HackerNews AND arXiv AND PyPI), the signal is stronger. Our scoring system rewards cross-source convergence.

4. Confidence Score

What it means: Confidence is a number between 0 and 1 (displayed as 0–100%) that indicates how strong and reliable a signal is. It is NOT a prediction of future success — it measures current signal strength.

How it is calculated:

• Base score: Every detected signal starts at a calibrated baseline.

• Engagement boost: Based on actual metrics from the source — HackerNews points, Dev.to reactions, GitHub stars, package downloads.

• Source diversity boost: Signals appearing in multiple independent sources score higher.

• Cross-source corroboration: When the same signal title appears from two or more sources, it gets an additional boost.

• Logarithmic scaling: We use log-scale normalization so that going from 10 to 100 mentions matters more than going from 10,000 to 10,100. This prevents popular topics from drowning out emerging ones.

Interpreting confidence:

• 0.85+ — Strong signal: multiple sources, high engagement. Worth immediate attention.

• 0.65–0.84 — Moderate signal: clear presence but limited cross-source validation. Monitor closely.

• 0.45–0.64 — Weak signal: early or single-source. Could be noise or could be an emerging trend worth watching.

Important: Confidence scores are recalculated every 6 hours with fresh data. A topic that scores 0.70 today may score 0.85 next week if momentum continues, or drop to 0.50 if interest fades.

5. AI-Generated Narratives

Starter, Pro and Business plans include AI-generated executive narratives that synthesize signals into actionable prose. These narratives are:

• Generated by Anthropic Claude using data from our collectors plus targeted web search at publication time for verification

• Post-processed by a fact-checking pipeline that removes unverifiable numbers

• Market size claims and fabricated statistics are automatically replaced with qualitative language

• Each narrative clearly states how many sources were active in that collection cycle

Narratives should be read as analytical commentary, not as definitive market research. Always verify specific claims independently before making business decisions.

6. Weekly Products

Weekly Digest (Monday morning, delivered in your organization's local timezone): Email summarizing the week's key trends, compiled from the 7 daily collection narratives.

Deep Dive (Saturday morning): An investigative article (~2,000+ words) on the week's most impactful topic, generated with web search verification and fact-checking. Published permanently on the site and emailed to subscribers.

Both weekly products build on the accumulated daily data — the quality of weekly output directly reflects the quality of daily collection.

7. Reading your report

The report has four signal sections plus a talent analytics summary. The header shows your data freshness (e.g., "collected 6h ago"), and source coverage for the cycle (how complete the collection was). A history dropdown lets you compare the current report to past snapshots — retained for 7 days on Starter, 90 days on Pro and 365 days on Business.

Executive summary: opens with active source count (e.g., "14/14 sources"), signal count, and data freshness for the current cycle, plus an AI narrative synthesizing the day's signals. Treat the narrative as analytical commentary, not as definitive market research.

Strategic Briefing (Business): one paragraph reading the current cycle through your own watchlist — topics and companies you configured — generated on your first report visit of the day.

Relevance ranking: with a profile configured in Settings (topics + companies to watch), every section is re-ordered by relevance to your profile and watched-company items surface first. No profile = neutral global ordering.

Technology & Market Trends: emerging technologies, frameworks, and tools showing engagement. Each entry includes a confidence score, raw strength and mention count, the evidence chain (which sources contributed), and a velocity badge (week-over-week delta in mentions).

Talent Intelligence Signals: public developer profiles showing emerging signal strength. Each profile is scored 0–100 based on activity, cross-source presence, and topic relevance:

• 90–99 — Leadership-level: high-impact contributors with sustained presence

• 80–89 — Senior: established, multi-platform activity

• 70–79 — Mid: growing presence, single primary platform

• Below 70 — Junior or early-career

The score is a signal strength indicator, not a personnel ranking. Use it as a starting point for qualified outreach, never as the sole basis for a hiring decision.

Hiring & Opportunity Signals: job postings extracted from public boards (HackerNews "Who's Hiring", Adzuna, Remotive). Reflects current market demand. Useful as competitive intelligence: a cluster of Rust openings at one company signals a shift toward systems work. Each posting carries a hiring status — new (company first seen hiring within 14 days) or recurring — which also powers watched-company alerts on Pro and Business.

Talent Analytics: aggregates the talent profiles in a separate panel — total profiles analyzed, average and top scores, seniority distribution across the four buckets, dominant tech stacks in the pool, and strategic insights flagged automatically (e.g., "Low concentration of technical leaders, indicating potential scarcity"). Use this to characterize the talent pool as a whole, not just individual profiles.

How to act on signals: a strong market trend (0.85+) with cross-source convergence is the cue for a technology adoption review. Repeated talent signals in one topic indicate a pool worth qualified outreach. Hiring clusters around a topic validate the underlying trend with market demand.

Every paid plan links signals to their evidence chain — drill in to validate before acting. The "Show raw JSON" button surfaces the full report payload for programmatic use or audit.

Questions?

If anything about our methodology is unclear, contact us at support@gatiflow.io. Transparency is a core principle — we are happy to explain any aspect of our data pipeline in detail.