How MOOCs Threaten Your Privacy

If you take a MOOC in statistics to demonstrate your mastery of regression analyses and forecasting, you might get promoted at work. You might also become a statistic yourself.

MOOC providers and their third-party consultants collect and mine the massive amounts of data their courses generate. Accordingly, parents, teachers, and legislators are increasingly concerned about student privacy and independence. Students in “Preparing for the AP Statistics Exam” from EdX, for instance, are monitored for online habits such as how many attempts they make at online assignments, what they write about in discussion threads, and whether they reach out to the professor for help. That data holds the key to identifying the traits that mark good students and good pedagogies. Does watching videos before reading the text aid comprehension? Are more quizzes better than fewer? Thanks to big data and patterns of correlations, EdX knows. That knowledge, though, comes at a price.

Why MOOCs Are Ripe for Data Mining

Data-mining is popular in many fields, and it is not new to education. Brick and mortar universities already observe how often students swipe their IDs at the library versus the student lounge, what time of day they study, how well they perform and in what classes they do well, and whether their profiles match others who are at-risk, high-performing, or potentially prone to do well in a specific field.

MOOCs offer ideal conditions for such aggregations. The whole point of mining through data is to collect lots of it over repeated intervals with similar parameters in order to identify and compare trends over time. Large courses yield thousands of users watching the same videos, performing similar quizzes, with identical opportunities to contribute to comment threads. Online, every action—from the simple act of consulting an e-textbook, to contributing to a discussion group, to answering a classmate’s question to trying a quiz question thrice before finally getting it right—goes through a standardized, recordable system.

MOOC developers hope data mining might solve two perennial MOOC woes: poor retention and comprehension outcomes and low profits. On the first count, monitoring behavior yields aggregate patterns of different types of students. Multiple choice quizzes at the end of each lecture may work well for student type A, but a student that fits profile B may be better served by shorter, more frequent quizzes in the midst of lectures. Pegging students early on and tweaking assignments lets algorithms “personalize” a course to fit each student, mimicking the way a good teacher might adapt to her students mid-semester. On the second count, deducing better online pedagogy can lead to better, more marketable, more valuable products—not just for MOOC providers, but also for their patrons who process and sometimes buy aggregate data. That could inject some much-needed cash into the industry.

The Privacy Risks

While on the one hand, online monitoring seems more benign—or at least more to be expected—than close observation at a physical campus, it also sharply increases the opportunities for infringing on privacy. The MOOC data collected ranges from basic, non-identifiable data points to personal information like your address and birthday. That’s where data-mining provokes apprehension. Many are worried about privacy and the security of sensitive information, and data mining has sparked backlash before. InBloom, the Gates Foundation-funded data processing and storage site for K-12 schools, announced its closure in April, citing the unrelenting outrage from parents over potential dangers to their children’s privacy. And one of the major causes of distrust of the Common Core has been the data-collecting it entails.

What exactly do MOOCs monitor? Course registration data, primarily. To create an account with EdX, the MOOC consortium started by Harvard and MIT, you need to provide only a first and last name, email address, and country of residence. Participants also have the option of divulging their highest level of education, gender, birth year, mailing address, and reason for joining EdX. Coursera’s registration criteria–full name and email address—are simpler still.

Inside the course, however, the data observed is extensive. EdX, for instance, tracks the hyperlinks that directed the user to the site; the user’s IP address (and from this, sometimes, the Internet Service Provider and the user’s geographic location), operating system and browser software; the students’ performance on assignments and behavior when progressing through course material; emails and communication with the professor and with company representatives, and other activity in the course. Users can prevent some of the collection by disabling the “cookies” that underlie the system, but doing so may disrupt “many functions and conveniences,” the privacy policy warns.

Sometimes the intentions behind such collections are benevolent and aim to help participants, as in the quizzes tailored to students’ past performance. Identifying deviations from habits, too, can help detect violations of the honor code. Another goal is to match people whose profiles show similar interests. EdX acknowledges in its privacy policy that it will “recommend specific study partners or connect potential student mentees and mentors” by giving them each other’s usernames, though not their real names or email addresses.

EdX also hopes that by drawing on aggregate data, it can adjust and improve its courses over time. Among the uses of data that users authorize when they agree to its privacy policy is “scientific research, particularly, for example, in the areas of cognitive science and education.” Coursera includes a similar stipulation in its policy as well. The idea is to fiddle with different structures, types of quiz questions and lengths of quizzes, and other features of the course and see how students react. Experimentation online can then lead to improvements back at the physical campus. (Improving pedagogy, according to EdX CEO Anant Agarwal, is one of the three main reasons EdX was formed.)

Useful or not, the collection comes with its own risks. EdX users, for instance, give the consortium the right to re-use (anonymously) anything they write and post in a MOOC forum. Ed-tech companies also track students’ performance over time, generating a detailed educational record that can demonstrate impressive accomplishments, but also create permanent data-trail of youthful mistakes. And while aggregate data can theoretically be expunged of personally identifiable material, it’s unclear whether this safety is regularly practiced. A study this summer found that while MOOCs yield treasure troves of educational data, most of it, once stripped of personally identifiable characteristics, is useless for the kinds of profitable functions MOOC providers hope it will perform.

The bigger concern is the security of private data, especially when transmitted to third parties such as university researchers or outside companies. Now that many MOOC providers are targeting high school students, parents worry about their children’s security. Technically, the Family Educational Rights and Privacy Act (FERPA) only protects high school students’ privacy regarding material used as part of their official schooling; most students who take MOOCs use them as supplements that don’t qualify. This has prompted the National Association of Secondary School Principals to issue recommendations encouraging schools to deem all online student work—whether officially part of homework or not—as contributing to an “education record” covered by FERPA. It also encouraged districts to appoint a chief privacy officer responsible for protecting students’ information.

More than fifty K-12 ed-tech companies signed a pledge in October meant to assuage fears by promising to reveal what data they collect, let parents review and correct their child’s information, never sell the data or draw on it to target advertising at the children, and to do so for all data they collect, not just the data directly related to schoolwork. The pledge drew criticism, though, because it only prohibited signatories from dispersing the data, but permitted them to continue collecting it. Many also expressed concern at the number of ed-tech companies that declined to sign—including Khan Academy, Apple, Google, EdX, and Coursera. A number of states are also considering legislation to shore up privacy laws.

Those are steps in the right direction, but the tension between knowledge and privacy is unavoidable. Companies will find methods to collect the precious data integral to their operations. In the meantime, users must be vigilant about their security.


One thought on “How MOOCs Threaten Your Privacy”

Leave a Reply

Your email address will not be published. Required fields are marked *