Reconciling Cyber-Security Research with Privacy Law: The Video Analytics and Medical Image Analysis Examples
Research and development (R&D) in video content analysis, medical image analysis, and other data analysis techniques related to anomaly detection, require huge amounts of data for training and evaluation. This is underscored by the groundbreaking deep-learning paradigm. The only practical source for relevant data is collections of real data acquired in the field. In the context of video content analysis, this refers to actual video surveillance databases acquired in public areas. Such data is strictly protected by privacy regulations. Consequently, its use for R&D is practically limited to large corporate entities that handle the data as part of their business. These include surveillance system providers, cloud services and social networks. Academic research on these topics is therefore crippled, and new industrial players are also excluded. In the context of medical image analysis, the relevant data is the collection of medical images stored in Picture Archiving and Communication Systems (PACS) at hospitals. Access to this resource is usually available to hospital staff only, creating an effective data monopoly with respect to external academic and industrial players. The proposed research, at the interface between technology, law and policy, will evaluate the problem and develop interdisciplinary solutions, facilitating academic R&D in video content analysis, medical image analysis and similar cyber-security anomaly-oriented data analysis challenges.
The overall objective of this research is to remove obstacles to research in video content analysis, medical image analysis and additional fields that require training and evaluation of data that can only be obtained from privacy-sensitive databases. If successful, the suggested research will open the door to research in academic institutions and small and medium enterprises (SME’s) on major topics that have essentially been the exclusive playground of database holders. The vision of this proposal is democratization of research. Leaving the enormous value of big data in the exclusive hands of the arbitrary data holders is inefficient. Resolving the privacy concerns is the key to tremendous progress in crucial domains, such as health and homeland security. The effect of deep learning and its future developments on artificial intelligence, and on society in general, might eventually be comparable to the computing and networking revolutions. The move from exclusive mainframes to personal computers, and the move from the exclusive Arpanet and Bitnet networks to the Internet, demonstrates the value of democratic, competitive research environments. This proposal aims to promote an analogous move with respect to big data needed for research. Accomplishing this goal requires an interdisciplinary approach, involving technology, law and policy.
- Revealing the public benefit of research access to big data
- Quantifying the necessity of big data for deep learning
- Non-destructive anonymization and distributed deep learning
- Access to research data: an Hohfeldian perspective
- Propose regulatory amendments to maximize the extractability of public benefit from privacy-sensitive databases, while providing appropriate privacy protection
- Develop a binding ethical code for researchers