Education · Understanding Algorithmic IR

How AI Reads Your 10-K: A Visual Guide to Algorithmic Filing Analysis

AI investor relations systems operate fundamentally differently from human analysts. Rather than evaluating narrative quality or management credibility, they perform statistical analysis on filing text — with outputs feeding directly into institutional trading models within 60 seconds of EDGAR submission.

The Pipeline: From EDGAR to Algo Signal in 60 Seconds

The process unfolds through four sequential steps:

  1. Text extraction — isolates prose sections (MD&A, Risk Factors, Business description, footnotes), processing each independently. MD&A and Risk Factors receive the highest signal weight.
  2. Tokenization and normalization — breaks text into words and phrases, removes punctuation, reduces words to lowercase roots. This eliminates tone while preserving word choice.
  3. Dictionary scoring — applies the Loughran-McDonald Master Dictionary, developed specifically for financial disclosures and containing six empirically-validated word categories.
  4. Signal generation — combines dictionary scores with structural features (filing length, sentence complexity) to produce composite signals feeding quant screens and risk algorithms.

The Six Loughran-McDonald Categories

Each category was tested against actual market outcomes — specifically predicting abnormal returns within 3–12 months post-filing:

1. Negative (2,355 words) HIGH IMPACT

Words including "adverse," "volatility," and "uncertainty." Research indicates that increased negative word density correlates with lower abnormal returns in the following 12 months. This is the most critical category to monitor.

2. Positive (354 words) MODERATE

Terms like "achieve," "strength," and "improvement." The notably shorter positive list reflects that financial disclosures are inherently cautious. Excessive positivity can trigger credibility discounts — the system is calibrated to expect measured language.

3. Uncertainty (297 words) HIGH IMPACT

"Approximately," "uncertain," "contingent," "possibility." High uncertainty scores correlate strongly with analyst downgrades and increased short interest within 60 days post-filing. This is the top predictor of follow-on scrutiny.

4. Litigious (903 words) SECTOR-RELATIVE

Legal terminology like "litigation," "plaintiff," "defendant." The relevant metric is whether your litigious word density exceeds your sector peer average — context matters here.

5. Strong Modal (68 words) POSITIVE

"Will," "must," "require," "should." High density signals commitment and generally produces positive algorithmic outcomes.

6. Weak Modal (27 words) MOST AVOIDABLE

"Could," "may," "might," "would," "possible." This is the most common avoidable signal problem in small-cap filings. Excessive hedging signals management lacking conviction.

Benchmarks: Where Small-Caps Stand

MetricIndustry AvgTop QuartileBottom Quartile
Negative word density3.2%2.1%4.8%
Weak modal density1.8%0.9%2.9%
Uncertainty score1.4%0.8%2.3%
Fog Index (MD&A)17.213.821.4
YoY cosine similarity0.820.910.71

Bottom-quartile companies are often communicating identical financial realities in algorithmically penalized language.

The Delta Problem: How Change Gets Detected

Algorithms prioritize textual changes between consecutive filings over absolute content. Substantial filing changes between periods correlate with lower post-filing returns — large textual shifts signal instability or disclosure anxiety.

Key sections receiving delta analysis: Risk Factors (new vs. removed risks), MD&A (language drift on core business description), Liquidity sections (going-concern language changes), Legal Proceedings (new additions).

Bottom-quartile companies are not necessarily communicating worse financial results than top-quartile companies. They are communicating the same results in algorithmically penalized language. That is a correctable problem.

Five Filing Changes That Improve Your Algo Score

  1. Replace weak modals in forward guidance. Transform "may be able to achieve" into "expect revenue growth of X%."
  2. Trim resolved Risk Factors. Accumulating unresolved risks sends direct negative signals. Remove mitigated risks and document the rationale briefly.
  3. Shorten Liquidity section sentences. This section typically generates high Fog scores. Adopt conversational CFO language — shorter sentences, specific numbers.
  4. Use parallel structure in MD&A. "Revenue increased 12%. Gross margin expanded 80 basis points." outperforms discursive paragraphs that achieve the same content at twice the word count.
  5. Anchor the Management Assessment section. Specific numbers, timelines, and named business drivers outperform vague industry commentary. This section receives the highest NLP weighting.

Get Your Filing Benchmarked Against 500+ Peers

Free algorithmic analysis of your latest SEC filings across all covered NLP metrics. No account required.

Run Free Score →

This article is informational and not investment or legal advice. See our Disclaimer.