Using OCCRP’s Aleph for Dark Web Data Analysis

This publication was made possible through a Natural Sciences and Engineering Research Council of Canada (NSERC) Applied Research and Technology Partnership Grant undertaken by The Humber College Institute of Technology and Advanced Learning, and Toronto Star Newspapers Limited.

‘Using OCCRP’s Aleph for Dark Web Data Analysis’ is step-by-step guide to safely investigate dark web data dumps, published by Humber College’s StoryLab.

For enterprising newsrooms, data breaches have flooded the Dark Web with data of immense public interest, but which poses significant security risks. This guide will show you how to navigate dark web data dumps safely and effectively using OCCRP’s Aleph.

Read the Report

OCCRP Aleph Report Cover Page

About

The grant proposal that led to the creation of this guide, “Development of a Data Server Framework,” was undertaken with the purpose of creating a relatively secure, cost-effective, and collaborative solution for analyzing large troves of data situated on the un-indexed internet, colloquially referred to as the “Dark Web.”

While journalists are no strangers to working with leaked data (see: Iraq War Logs, The Panama Papers, The Troika Laundromat, et al.) dark web data dumps can pose unique ethical and security concerns. We encourage reporters and researchers interested in exploring leaked data to fulsomely engage with the relevant ethical, legal, and editorial authorities within their organizations before beginning an investigation focused on the Dark Web.

About OCCRP’s Aleph

The Organized Crime and Corruption Reporting Project (OCCRP) is a non-profit investigative newsroom founded in 2007. Its large, decentralized newsroom focuses on reporting on and minimizing the threat of crime and corruption around the world. As part of this mission, the OCCRP develops novel technological solutions to aid investigations.

Aleph is an open-source data management tool created to manage and make sense of the enormous, varied caches of documents that are part and parcel of crime and corruption investigations. One of Aleph’s greatest strengths is serving as a relational database. The OCCRP maintains a massive store of data comprised of opensource and leaked files that journalists can use to find leads, map out complex relationships, and even ingest and cross-reference their own files against Aleph’s database.

Acknowledgements

Thanks to: Ariana Rydzkowski, Dr. Timothy Wong, Ali Owayid, Janice Saji, Emma Best, Lorax Horne, Milo Trujillo, Jan Strozyk, Alex Ștefănescu, Irene Gentle, David Bruser, Robert Cribb, Ginger Grant, Shyama Patel, Daniel Alvarado, Francis Syms, Daniel Schwartz

Meet the Team

  • David Weisz

    David Weisz

    Director, StoryLab, Humber College

    David Weisz is a data journalist and educator passionate about storytelling, spreadsheets and pandas (both furry and Python varieties). Creator of Data Driven, Canada's premier data journalism symposium. He is currently exploring new ways to collaborate on data-driven storytelling as a co-founder and director of Humber College's StoryLab.

Project Partners

  • Toronto Star