What novelty tracking means in ViceWire
At a high level, novelty tracking answers three separate questions:Is this a duplicate?
Is this article effectively the same story as one already seen?
Is this a related update?
Is this article part of the same broader event thread, even if the wording is different?
Is there new information?
Does this article introduce a materially new development, angle, or confirmation?
Current novelty layers
Today, ViceWire supports novelty tracking through two main layers:1. Exact and near-exact title matching
This is useful for catching repeated postings, obvious republishes, and cases where the same article title appears across the monitored corpus. This is the simplest novelty layer. It is fast and useful, but intentionally narrow: title matching alone cannot reliably distinguish a new development from a lightly rewritten version of an older story, and it can misfire on recurring editorial formats such as weekly roundups or recurring columns where the title stays the same but the underlying content changes.2. Structured related-story grouping
ViceWire also produces structured event metadata that can be used to group related stories together. At a practical level, articles with the same or very similar grouping signals may belong to the same underlying event thread:entityevent_familyevent_sub_typesevent_stage_statusestimingaffected_business_surface- family-specific metadata such as counterparties, competitor references, partner names, geographies, dates, and other event-specific fields where supported
Similarity-based near-duplicate detection
ViceWire is building a dedicated near-duplicate layer for syndicated and lightly rewritten coverage. It uses fingerprinting and overlap checks over normalized article text to identify cases where two articles are not exact copies but still reflect substantially the same underlying text, including cross-domain syndication and light editorial rewrites.Why this works
Each layer addresses a different novelty problem:- title matching catches exact or near-exact reposting
- structured metadata grouping connects related stories that belong to the same underlying event thread or broader theme
- text fingerprinting and overlap-based comparison help identify syndicated or lightly rewritten articles even when titles and formatting differ
Where this is going
ViceWire’s long-run direction is not simply to suppress repeated coverage, but to maintain structured continuity across reporting on the same underlying event. That means a mature novelty layer should be able to tell users:- when an event was first seen
- whether the current article is a duplicate, a near-duplicate, or a follow-up
- what new information, if any, has been added
- how the story has evolved across related coverage
- when repeated coverage may still matter because it increases dissemination, attention, or market awareness