Blog

Journal

oops!

English analysis pipeline stress-tested with Project Gutenberg texts from Austen, Carroll, Shelley, and Melville. That run exposed four bugs, all now fixed: TDI

English analysis pipeline stress-tested with Project Gutenberg texts from Austen, Carroll, Shelley, and Melville.
That run exposed four bugs, all now fixed:
TDI scoring was inverted
The three core lanes — Reading Ease, Grade, and Fog — were producing flipped results. Fixed.
“kind of” was misread without context
Phrases like “a kind of ignorant carelessness” were being flagged as filler. The parser now distinguishes between noun phrases like “a kind of …” and actual hedge fillers like “kind of tired.”
Passive detection was too aggressive in literary prose
Descriptions like “was wrapped in furs” or “was bathed in tears” were being marked as passive voice. Added a broader false-positive list for literary state descriptions to reduce incorrect flags.
LDI sentence-complexity scaling was too harsh
Longer literary sentences, especially beyond 30 words, were being penalized far too heavily. The curve has been adjusted so complexity scoring now behaves properly again.
The German pipeline was checked against the same bug classes and did not show the same issues, so no changes were needed there.