Whether official secrecy is random or predictable is a matter of great public controversy. But the question has not been explored using Natural Language Processing and Machine Learning methods.
Join Columbia University History Professor Matt Connelly for a discussion about using data science to compare political narratives with the real historical record.
Admission is Free
Register Today
This event is co-hosted by Text.IQ
Artificial Intelligence to Identify State Secrets
Abstract:
Whether official secrecy is random or predictable is a matter of great public controversy. But the question has not been explored using Natural Language Processing and Machine Learning methods. We report the results of an experiment with nearly one million State Department cables from the 1970s to identify diplomatic communications that were originally classified as containing sensitive national security information. In analyzing the data, we found that cables classified as secret were disproportionately likely to have been lost or corrupted. But even with incomplete data, we were able to train algorithms to identify 90% of classified cables with <11% of false positives.
Matthew Connelly is a professor of international and global history at Columbia. He is also the principal investigator of History Lab, a project to apply data science to the problem of preserving the public record and accelerating its release. He received his B.A. from Columbia and his Ph.D. from Yale. His publications include “A Diplomatic Revolution: Algeria’s Fight for Independence and the Origins of the Post-Cold War Era”, which won five prizes, and Fatal Misconception: The Struggle to Control World Population, an Economist and Financial Times book of the year. He has provided commentary on international affairs for The Atlantic Monthly, The New York Times, The Washington Post, and Le Monde, and has hosted radio documentaries for BBC Radio.