Planting Seeds of Truth
Introducing a searchable decentralized repository of suppressed information.
By now you're probably aware that large volumes of important information have increasingly been disappearing from the internet. Some of this is actively censored or suppressed, while the rest vanishes into the black hole of algorithmic obscurity. WantToKnow.info where I work is right in the middle of the struggle to keep getting the word out. At the moment, our parent organization PEERS is sponsoring a project called Seeds of Truth, which is designed to address this issue in a promising way.
We've been robustly archiving and training an AI on data from a variety of reliable yet suppressed media sources. Along the way, we created a neat decentralized dataset that's now publicly available and ready for anyone to use. We call it Beta Slim. It consists of 86,263 chunks of text divided into 133 lexical groups, plus the extra information needed to efficiently search the set using TF-IDF vectors. Here's where the information came from:
The WantToKnow summaries archive of 13k+ news articles on corruption, cover-ups, and inspiring topics is included in Beta Slim. Here's where most of the news reports included in the WanToKnow archive come from:
We stored this dataset and its vector information on IPFS and registered it on the Hive blockchain, so the data is technically and provably impossible for anyone to tamper with. It's stored in json format, making it easy for webpages or apps to plug in and make use of the suppressed information. You can view my notebook for dataset processing and archiving on github here. You can also directly search Beta Slim here.
There are technical things that make this dataset interesting. But don't worry, I won't bore you with the details of k-means computation and centroid proximity query routing. The most important thing about Beta Slim is that it makes it easier for people to access some of the information that's being hidden from us by Google and social media on the modern web. That makes the dataset a public good with strong value. If you want to support this work, please make a tax deductible donation to PEERS.
For more of my work, visit Rstory.




