Skip to main content
ViceWire uses novelty tracking to distinguish genuinely new developments from repeated reporting, syndicated copies, and related follow-up coverage. This matters because not every additional article adds new information. Some articles are exact repeats, some are light rewrites of the same underlying story, and some are meaningful follow-ups that change the interpretation of an earlier event.

What novelty tracking means in ViceWire

At a high level, novelty tracking answers three separate questions:

Is this a duplicate?

Is this article effectively the same story as one already seen?

Is this a related update?

Is this article part of the same broader event thread, even if the wording is different?

Is there new information?

Does this article introduce a materially new development, angle, or confirmation?
ViceWire does not treat novelty as a single binary flag. It is better understood as a layered process that separates exact duplication, near-duplication, and related-event continuity.

Current novelty layers

Today, ViceWire supports novelty tracking through two main layers:

1. Exact and near-exact title matching

This is useful for catching repeated postings, obvious republishes, and cases where the same article title appears across the monitored corpus. This is the simplest novelty layer. It is fast and useful, but intentionally narrow: title matching alone cannot reliably distinguish a new development from a lightly rewritten version of an older story, and it can misfire on recurring editorial formats such as weekly roundups or recurring columns where the title stays the same but the underlying content changes.

2. Structured related-story grouping

ViceWire also produces structured event metadata that can be used to group related stories together. At a practical level, articles with the same or very similar grouping signals may belong to the same underlying event thread:
  • entity
  • event_family
  • event_sub_types
  • event_stage_statuses
  • timing
  • affected_business_surface
  • family-specific metadata such as counterparties, competitor references, partner names, geographies, dates, and other event-specific fields where supported
This matters because two articles can be textually different while still describing the same underlying event thread or broader theme. Structured grouping does not determine novelty by itself. What it does is create an event-aware comparison layer. If two articles resolve to very similar structured metadata, ViceWire can treat them as related coverage and then ask a more precise question: does the newer article add a materially new development, confirmation, stage change, or factual detail, or is it mostly repeating what was already known? Even when the underlying fact is not new, broader dissemination can still matter because it affects attention, diffusion, and potentially market outcomes.

Similarity-based near-duplicate detection

ViceWire is building a dedicated near-duplicate layer for syndicated and lightly rewritten coverage. It uses fingerprinting and overlap checks over normalized article text to identify cases where two articles are not exact copies but still reflect substantially the same underlying text, including cross-domain syndication and light editorial rewrites.

Why this works

Each layer addresses a different novelty problem:
  • title matching catches exact or near-exact reposting
  • structured metadata grouping connects related stories that belong to the same underlying event thread or broader theme
  • text fingerprinting and overlap-based comparison help identify syndicated or lightly rewritten articles even when titles and formatting differ

Where this is going

ViceWire’s long-run direction is not simply to suppress repeated coverage, but to maintain structured continuity across reporting on the same underlying event. That means a mature novelty layer should be able to tell users:
  • when an event was first seen
  • whether the current article is a duplicate, a near-duplicate, or a follow-up
  • what new information, if any, has been added
  • how the story has evolved across related coverage
  • when repeated coverage may still matter because it increases dissemination, attention, or market awareness
This roadmap includes expanding ViceWire’s ability to distinguish duplicate and lightly rewritten coverage so that repeated reporting can be classified and weighted appropriately.