Task-specific information outperforms surveillance-style big data in predictive analytics

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Task-specific information outperforms surveillance-style big data in predictive analytics. / Bjerre-Nielsen, Andreas; Kassarnig, Valentin; Lassen, David Dreyer; Lehmann, Sune.

I: Proceedings of the National Academy of Sciences of the United States of America, Bind 118, Nr. 14, e2020258118, 06.04.2021.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Bjerre-Nielsen, A, Kassarnig, V, Lassen, DD & Lehmann, S 2021, 'Task-specific information outperforms surveillance-style big data in predictive analytics', Proceedings of the National Academy of Sciences of the United States of America, bind 118, nr. 14, e2020258118. https://doi.org/10.1073/pnas.2020258118

APA

Bjerre-Nielsen, A., Kassarnig, V., Lassen, D. D., & Lehmann, S. (2021). Task-specific information outperforms surveillance-style big data in predictive analytics. Proceedings of the National Academy of Sciences of the United States of America, 118(14), [e2020258118]. https://doi.org/10.1073/pnas.2020258118

Vancouver

Bjerre-Nielsen A, Kassarnig V, Lassen DD, Lehmann S. Task-specific information outperforms surveillance-style big data in predictive analytics. Proceedings of the National Academy of Sciences of the United States of America. 2021 apr. 6;118(14). e2020258118. https://doi.org/10.1073/pnas.2020258118

Author

Bjerre-Nielsen, Andreas ; Kassarnig, Valentin ; Lassen, David Dreyer ; Lehmann, Sune. / Task-specific information outperforms surveillance-style big data in predictive analytics. I: Proceedings of the National Academy of Sciences of the United States of America. 2021 ; Bind 118, Nr. 14.

Bibtex

@article{ca835a2928fc4916b43f5f5efc6c7fea,
title = "Task-specific information outperforms surveillance-style big data in predictive analytics",
abstract = "Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacyinvasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with {"}ground truth{"} administrative registry data can ideally allow the identification of privacy-preserving taskspecific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.",
keywords = "Academic performance, Big data, Prediction, Privacy",
author = "Andreas Bjerre-Nielsen and Valentin Kassarnig and Lassen, {David Dreyer} and Sune Lehmann",
year = "2021",
month = apr,
day = "6",
doi = "10.1073/pnas.2020258118",
language = "English",
volume = "118",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
publisher = "The National Academy of Sciences of the United States of America",
number = "14",

}

RIS

TY - JOUR

T1 - Task-specific information outperforms surveillance-style big data in predictive analytics

AU - Bjerre-Nielsen, Andreas

AU - Kassarnig, Valentin

AU - Lassen, David Dreyer

AU - Lehmann, Sune

PY - 2021/4/6

Y1 - 2021/4/6

N2 - Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacyinvasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with "ground truth" administrative registry data can ideally allow the identification of privacy-preserving taskspecific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

AB - Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacyinvasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with "ground truth" administrative registry data can ideally allow the identification of privacy-preserving taskspecific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

KW - Academic performance

KW - Big data

KW - Prediction

KW - Privacy

U2 - 10.1073/pnas.2020258118

DO - 10.1073/pnas.2020258118

M3 - Journal article

C2 - 33790010

AN - SCOPUS:85103745351

VL - 118

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 14

M1 - e2020258118

ER -

ID: 260517561