Human zero day

 · Jim Fuller

'Semantics, knowledge graphs, and triples allow for the automatic representation of knowledge, at scale.'

Let me step back a bit ... but not so far back we are talking about horn clauses or prolog.

The past decade there has been some significant adoption of semantic technologies ... mind you not the wild 'everyone will do this' kind of adoption but the slow burn by very large commercial entities because it is so compelling kind of adoption (and they have the means and multiple data streams being generated by their customers). This adoption is not neccesarily enriching a publically available web but enlarging the information 'treasure chests' of a small group of very large entities.

Being able to encode data (usually with 'triples') enables a single index to answer powerful queries (SPARQL, cypher, et al) which transcend one dimensional full text searching. Triples are easily embedded with the various Open linking technologies (ex. JSON-LD). Triples are trully everywhere eg. Google harnessed the most powerful force Mankind has ever created eg. SEO tagging the web using entities defined at schema.org ... but thats not relevent to this article.

It did not start when Google engineers wrote "Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion" eg. Apple had bought Siri in 2010 and Microsoft, Amazon, Meta and others have been building significant triple stores for more then a decade. From a commercial point of view ... combined with AI a knowledge graph represents the end goal of perfect intelligence on the consumer ... this is a kind of singularity eg. akin to solving the game of chess ... once a critical line has been passed all sorts of unknown unknowns make themselves known.

Creating a knowledge graph embodies noble goals espoused by:

We see all sorts of end user activity in this domain (ex. Notion) as we try to extend the dream of the interwebs being a vast (and correct) knowledge library for all, as well as build up one's own personal knowledge graph... but the reality emerging is far from that utopian ideal. Even as a planet scale Library we have seen commercial forces dissapate the web into a chintzy application bazaar - vendor software lock in is bad ... vendor data lock in is worst ;) Perhaps most complexing is that people are fine with generating valuable graphs for said commercial entities and do not seem to percieve how an aggregated graph could be used to alter reality/perception.

Lets hope the current lesson with bird site and mass migration to the elephant is an indicator of better intentions.

Back to the story ...

Google does enhance its search offerings with its knowledge graph as well as providing access to a subset of its knowledge graph - it is a small sliver of what it is building internally - that is Google likely generates triples off of all its services eg. you can bet where you sign off a copy of your data for Google ... it is possible for said data to be parsed for triples.

In the past it has been a source of comfort to me that the ease of which we can collect data does not translate into immediate actionable 'intelligence' from such data. But now I am not so sure this assymetry continues to exist ... there are also the common problems that it is all to often to incorrectly understand data (or easily manipulate data to the outcome you desire).

So what is the danger ?

I may detail specifics in future articles but for now this activity is fairly technical in nature ... large software service companies are the ones in the collection (and not necc exploitation) mode and they have the raw material (data). I am provisionally giving this the cute name of Human zero days eg. overwhelming information about any human domain from the small scale (eg. on specific individuals) to whole industries ... the term 'zero day' is chosen to convey complete exploitation that potentially works both for dominating information space as well as completely disrupting it.

Large government entities are also joining in the fun though I am currently more concerned of the technical excellance of a few well known companies (Google, Apple, Microsoft, Amazon, Meta are at the top of the list) who are in the process of commoditising these internal knowledge graphs.

For some large entities this activity is akin to patent hoarding which guarentee some outcome in the domain.

This level of information in the hands of governments could instigate a lot of 'good' ... though in the wrong hands I could equally imagine World War III starts with an unstable (albeit very rich) individual buying the commercial entity or some government gaining unfettered access to such knowledge graph. Not as obviously impactful as say nuclear technologies though I would argue 'existentially adjacent'.

While we can not be sure how current leadership of these companies are exploiting these knowledge graphs - it is obvious that any change in ownership or change in leadership at any of these companies could have a stratigic impact.

As engineers we need to speak up about the spectrum of threats knowledge graphs represent as well as highlight the best (least exploitable) opportunities giving the most value for the most people.

Where to start ?

  • transparent stewardship
  • Public knowledge graphs > Private knowledge graphs
  • decentralised by design
  • authoratative and citation based
  • privacy and control for individuals to own their triples

Beyond that needs (much) more consideration !