The Ethics of Digital Phenotyping

The Ethics of Digital Phenotyping
June 11, 2023 No Comments Gender Cassandra Miller

Your Phone Can’t Read Your Mind. Or Can It? The Ethics of Using Digital Data to Infer Signs and Symptoms of Psychological and Neurological Diseases

By Cassandra Miller

Image by antonynjoro from Pixabay

Imagine a future where your iPhone can detect early signs of Alzheimer’s, or a future where a doctor can monitor a depressive patient solely based on their phone usage to detect when they are having a depressive episode. Though these possibilities may seem far-fetched, they could soon become realities: Currently, Apple is conducting studies that hope to find signs of depression and cognitive decline in iPhone and Apple Watch data, such as user’s facial expressions, typing metrics, app usage, sleep patterns, and movement. Soon, medically-oriented apps may collect various forms of data, such as GPS location, phone usage, and touchscreen interactions, and use this data to infer a patient’s behaviors and routines, such as their sleep patterns, mobility, and social activity. Then, algorithms or doctors can note deviations in a patient’s behavioral patterns, which could reveal signs and symptoms of a mental health issue. How much of our privacy are we willing to sacrifice for the potential benefits and ease of use of these tools? Could patients reveal more sensitive information than they want to? Just because we may be able to use phones to reveal behaviors and illnesses, should we?

Table of Contents

Abstract

Image from Trusted Reviews

When asked to explain the idea behind his company, Anmol Madan, the co-founder and CEO of Ginger.io, said: “If someone is depressed, for instance, they isolate themselves, have a hard time getting up to go to school or work, they’re lethargic and don’t like communicating with others the way they typically do. [It] turns out you see those same features change in their mobile-phone sensor data in their movement, features, and interactions with others” (Matheson). Madan is describing the central idea behind digital phenotyping (DP), an emerging field that uses data collected by digital devices to detect possible signs and symptoms of various illnesses. Smartphones can provide an immense amount of passive data, such as GPS location, light exposure, phone usage times, typing metrics, and message or call logs. This data can be analyzed to infer a user’s behavioral patterns – such as their sleep habits, their social activity level, or their mobility patterns – and can ultimately serve as a tool to detect signs and symptoms of disease. For example, one study of people diagnosed with schizophrenia used each patient’s smartphone data to infer their behavioral patterns. Researchers found that, for patients who had a relapse, their digital patterns had 71% more anomalies in the two weeks prior to relapse than in other periods; such a finding could be used to predict relapses (Barnett et al.). Similarly, in a recent clinical trial, doctors used the behavioral patterns inferred by CompanionMx, a DP app similar to Ginger.io, to “predict mania up to two weeks in advance” (Bender). Furthermore, one study used GPS location and phone usage data from 40 subjects to develop a model that recognized that users were demonstrating signs or symptoms of depression with 86% accuracy (Saeb et al.). 

Since DP shows great promise, many mental health apps are beginning to incorporate digital DP techniques, and institutions and prominent technology companies, including Apple, are beginning to design DP features. These tools could prompt individuals to seek medical or psychological care, could give patients more insight into their illness and its progression, and could help predict mental health episodes or the onset of certain diseases. However, these benefits come at great costs. This paper will explore various ethical concerns regarding DP models: whether the potential benefits are worth sacrificing one’s privacy, how privacy and autonomy are affected differently for various stakeholders, fairness and equity, slippery slope possibilities, and trust.

How Digital Phenotyping Works

At present, there are existing consumer devices that can make inferences about physical health. For example, Apple Watches have features to detect atrial fibrillation, a type of irregular heart rhythm; to determine when a user is washing their hands and subsequently set a timer; and to detect car crashes (Apple).

While these features are mainly focused on physical health/trauma, the emerging field of digital phenotyping (also known as mobile sensing, personal sensing, or behavioral sensing) seeks to go further: it aims to use phone data collected in the background to support the diagnosis and treatment of mental health illnesses, such as depression, post-traumatic stress disorder (PTSD), schizophrenia, bipolar disorder, and addiction; neurological disorders, such as Alzheimer’s and epilepsy; and behavioral disorders, such as autism. 

By and large, DP platforms follow three key steps when translating raw phone data into behavioral insights, which can then be applied clinically: collecting raw data, summarizing and evaluating important aspects of the data in low-level features, and making behavioral inferences. First, the platform collects raw data from digital devices, such as smartphones or smartwatches; for this paper, I will largely focus on data garnered from smartphones, and more specifically, passive smartphone data. Whereas active data requires user input in addition to how the user already uses their phone, such as information obtained via a mood survey on an app, passive data is collected without any added input from the user; just by using their phone as they already do, the user generates passive data (Cornet and Holden).

Image by JESHOOTS-com from Pixabay

Passive data comes in two main forms: usage and sensor data. When an individual is interacting with their phone, they create usage data, such as touchscreen interactions (also known as human-computer interaction [HCI] or activity log), which shows a timed sequence of when the user touches their phone and how they touch it (tap vs. swipe); call and text logs, which reveal the time and length of calls and texts, respectively, but not their content; app usage; and typing metrics, which examines how someone interacts with their keyboard. In addition to considering usage data, DP platforms also may collect sensor data, which comes from various built-in sensors in the phone. Common types of this sensor data include GPS location; accelerometer, which shows the acceleration of the phone in the 3D plane; gyroscope, which shows the rotation of the phone in the 3D plane; and ambient light sensor. Though sensors themselves collect information about the phone’s environment and motion, they often reveal where the user is (GPS location) or what they are doing (accelerometer and gyroscope) because phones are so often near to or with their owners (Melcher et al.). For example, a current iPhone feature uses accelerometer and gyroscope data to infer when a person is in a car. Though some DP efforts consider both smartphone and smartwatch data or both active and passive data from phones, this paper will focus mostly on DP studies and platforms based on passive phone data. Such data is usually collected through an app or feature on the individual’s phone, and since it doesn’t require any added input from them, it can be collected completely in the background. 

Image by FunkyFocus from Pixabay

Second, DP platforms turn raw data into low-level features, which act as numerical summaries of this raw data (Mohr et al.). For example, raw data on GPS location might show the user’s location every 10 minutes (the minute intervals depend on the platform and how often it samples). With this raw data, an algorithm can find low-level features such as the total distance traveled per day, total time in transit per day, and total time the user spends at home. To find the latter, the algorithm must first find another low-level feature, significant locations, which will tag any locations that a user it as for an extended period and will infer the location of their school/work. Though the algorithms that do this are complex, the idea behind them is relatively simple: the majority of users (and their phones) are at their house from, say, 10 pm until 6 am on most nights; as a result, an algorithm can look out for the location that the user is typically at overnight and infer that it is their home location. The same concept applies for determining the location of someone’s school or office, just for different hours of the day. Algorithms that find significant locations are often quite accurate: one study found that “determining an anonymized user’s home location from GPS and accelerometer data has been shown to be successful in 79% of users” (Mulvenna et al.). For other data sources, like text logs, the process can be a bit simpler: raw data on a user’s text logs could reveal low-level features like number of texts sent, number of texts received, average length of texts, and average amount of time it took the user to respond to texts on a given week. 

Third, the platform uses the low-level features to infer aspects of the user’s behavior; such inferences are called behavioral markers, behavioral patterns, or high-level markers. Whereas low-level features provide a summary of one or two data sources over a given period of time, behavioral markers combine low-level features from various data sources to describe elements of a user’s behavior on a given day. Some behavioral markers include mobility, which looks at how much the user travels, where they go, and whether they are present at work, school, or other activities; social activity (also called sociability); sleep patterns; and mood. For example, the low-level features found from a user’s GPS location, phone usage, and call and text logs could be used to draw inferences about the user’s sociability, a behavioral marker. 

Though this process typically happens in these three main steps, it is not always linear and stepwise. For example, raw data on touchscreen interactions can often directly reveal a user’s sleep patterns: if someone doesn’t touch their phone from, say, 11 pm until 6 am, they are likely asleep for most of that period. See Table 1, the image at this link, and the image at this link for some examples of the process.

Table 1: The following table summarizes various passive data sources (column 1), examples of low-level features inferred from each source (column 2), and the behavioral markers that each source may contribute to (column 3):

Passive data sourceExamples of low-level features found from the data source (there are many other low-level features found from each data source, these are just some examples)Behavioral markers that the data source contributes to
GPS locationClassify specific location coordinates as important places to user (eg. their home, office/school, etc)
Time spent at home per day
Distance travelled per day
MobilitySocial behavior
Call log# of incoming calls
# of outgoing calls
# of calls not answered
Time and duration of each call
Social behavior
Text log# of texts sent
# of texts received
# of different phone numbers textedLength of each text
Average amount of time to answer each text
Social behavior
KeyboardTyping speed
Rate of typos
Mental awarenessPhysical impairment or injury
Ambient light sensorn/aSleep patterns
Accelerometer: orientation of phone in 3D planePhone position and phone movementActivity information and mobility
Touch screen interactions / activity log / human-computer interaction (HCI): A timed sequence of when user touches their phone throughout the dayLong periods when user is not on their phone
Speed between touches
Number of phone visits
Type of touch (tap vs swipe)
Sleep patterns


Cognitive state/awareness
App usageThe time a user spends on their phone and on each appMood
BluetoothCan pick up other devices in rangeSocial behavior

Current and Developing Real-World Digital Phenotyping Platforms

Ginger.io

Though DP is still in its early stages, some companies and institutions are beginning to use its techniques to improve their mental health services. One example is Ginger.io, a telehealth app that connects users with therapists or psychiatrists and is currently available on iPhones and Androids (Ginger.io).

The app incorporates DP technology to detect and even predict mental health illnesses: “By passively analyzing mobile data, the app [Ginger] can detect if a patient with mental illness — such as depression, anxiety and bipolar disorders, or schizophrenia — is acting symptomatic” (Matheson).

Visual of how Ginger.io’s digital phenotyping works

Here’s how Ginger.io’s DP works: When a patient downloads Ginger.io, they first complete a survey about their “conditions, treatment, and health care provider” (Matheson). The app begins to collect their passive phone data – including GPS location, accelerometer data, call and text logs, and app usage. Ginger.io uses this information to infer behavioral patterns, such as their social activity; mobility and presence at work, school, or other activities; and sleep habits (Ginger.io). Over time, Ginger.io gains an understanding of the patient’s typical routines, and as the app continues to collect data, it will use this to adjust and update its understanding of these typical patterns. At the same time, Ginger.io looks out for “significant deviations” in the user’s behaviors, and if it detects large enough deviations, it will notify the patient and their healthcare provider (Matheson). Ginger now boasts that its “technology is powered by over 1.3 billion data points” (Ginger.io).

One study that enrolled 15 individuals diagnosed with schizophrenia helps exemplify how detecting “significant deviations” may be useful. In the study, researchers collected passive phone data – such as GPS draft, accelerometer, communication logs (call and text logs), and phone usage – and used it to infer 15 daily mobility features and 16 social activity features for each patient (see these features here). Throughout the study, patients filled out biweekly surveys on their symptoms, such as depression, sleep, and psychosis, to give a typical form of symptom data (Barnett et al.). Using the mobility and social activity features, the researchers looked for anomalies in patient behavior. They found that the patients who relapsed had 71% more anomalies in the two weeks prior to their relapse than during other periods (Barnett et al.). Because Ginger.io’s algorithm searches for “significant deviations,” it may be able to detect such anomalies and notify the patient and their doctor accordingly.

CompanionMx

CompanionMx, a spinoff from the company Cogito, is a developing app that aims to “help clinicians help people with mood disorders” and is “designed to help clinicians make smarter decisions for their mood disorder patients” (Bender). CompanionMx’s system has three parts: First, the CompanionMx app, which the patient downloads on their mobile phone, collects data from users. This data is mostly passive phone data, such as GPS location, communication logs, and accelerometer data, but also includes a small amount of active data: patients are able to go into the app and provide voice recordings about their day (Bender). Second, the Companion artificial intelligence (AI) and a series of algorithms analyze the passive data and voice recordings to generate behavioral scores in four categories: mood, mobility, social isolation, and fatigue. When analyzing the voice recordings, the Companion AI examines how the patient is speaking, such as their tone of voice and pace of speaking, but does not analyze what the person says. Third, Companion displays the behavioral scores to the patient through the CompanionMx app and to their clinician through a dashboard (see the patient and clinician view here) (Bender). 

Image by Mohamed_hassan from Pixabay

So far, the CompanionMx app has been used “by more than 1,500 patients” in various trials and studies (Business Wire). For example, Brigham and Women’s Hospital conducted a trial of the app, where they enrolled approximately 200 patients to use the app to enhance their work with a clinician at the hospital. Lara Sullivan, a social worker at Brigham and Women’s Hospital who was involved in the study, noted that “it’s not that any particular score means a patient is depressed” (Bebinger). Instead, the app helps psychiatrists learn a patient’s typical patterns and scores, which makes it more clear when a patient’s condition deviates or worsens. As Sullivan puts it, “‘We start to learn people’s patterns. For instance, if [a patient] is always an 98, 99, and now they suddenly became a 20, the question is […] what happened?’” (Bebinger). This helps highlight a key difference in various DP platforms: both platforms infer behavioral patterns from passive data, but Ginger.io uses an algorithm to detect deviations in these behaviors while CompanionMx shows the behaviors to the clinician and leaves it to clinicians to detect any deviations. 

Some studies have shown CompanionMx’s efficacy and possible benefits when compared to typical care.

One study showed that CompanionMx may help doctors predict relapses or mental health episodes: “In a recent clinical study at a Harvard teaching hospital, clinicians used insights from the Companion dashboard to predict mania up to two weeks in advance” (Bender).

Furthermore, in a clinical trial at Brigham and Women’s Hospital, 68 patients were randomized into two groups. Over the six-month study, one group received usual care from a clinician, while the other group received usual care from a clinician bolstered by the CompanionMx system (Place et al.). At the start and end of the study, patients completed “the Patient Health Questionnaire (PHQ) sum score for depression severity, ranging from no depressive symptoms (0-4) to severe major depression (>20), and the Schwartz Outcome Scale sum score, ranging from 0 to 60, with higher scores indicating better overall psychological health” (Place et al.). As shown in the graphs below, the study found that the patients who received usual care bolstered by CompanionMx showed more improvement on both the PHQ and Schwartz outcome scale than patients who just received usual care, as shown in this graph from the study (Place et al.).

QuantActions

Whereas CompanionMx and Ginger.io focus mainly on psychological issues like depression and PTSD and are aimed at patients seeking treatment, the company QuantActions is more focused on brain health and is aimed at general phone users. Their app, also called QuantActions, is available currently on Android phones and “is coming soon” to iOS devices (QuantActions).

QuantActions collects only one form of passive data: touchscreen interactions. QuantActions claims that, with this information, it is able to “derive powerful insights into the user’s brain health” because “the speed of tapping provides insights into the user’s cognitive speed” and other aspects of their life (QuantActions).

Image by chenspec from Pixabay

One study suggests that this may, in fact, be true: in the weeklong study, researchers collected touchscreen interactions data from 27 subjects and then inferred 45 event patterns from the raw data. Researchers also administered a “gold standard” neuropsychological assessment to each subject—which included the digits backwards test, the animal fluency test, and other common tests on brain health—and then gave each patient a score for their working memory, memory, executive function, language, and intelligence. Next, researchers used the touchscreen data and event patterns to “identify digital biomarkers that were correlated with neuropsychological performance [and] found a family of digital biomarkers that predicted test scores with high correlations” (Dagum). As shown in Table 1 at this link, the research constructed graphs with each subject on the x-axis, and their Z-score on a given test on the y-axis. The blue squares represent their score from the “gold standard” neuropsychological assessment and the red circles represent the predicted score based on their digital data and the detected biomarkers. Though there are some discrepancies, the predictions overall appear to reflect the actual scores.

In addition to revealing someone’s cognitive state, touchscreen interactions can also show someone’s sleeping patterns: if a patient doesn’t touch their phone for an extended period at night, they are likely asleep during that time. Furthermore, QuantActions claims that a user’s tapping speed before bed and in the morning may show “how tired the user is before sleeping, and how fresh after waking up” (QuantActions). By analyzing a user’s raw touchscreen data, QuantActions generates daily scores for the user’s “sleep quality,” “cognitive fitness,” and “energy level” (QuantActions). It displays these scores to the user on the QuantActions app, as shown here.

Image by geralt from Pixabay

QuantActions is also working toward two future goals: a program for disease screening and for employers/employees. With its disease screening program, it hopes will use touchscreen data to flag users who might be suffering from cognitive decline and alert these users “early,” so “a physician can subsequently make a thorough assessment and diagnosis” (QuantActions). In addition to detecting cognitive decline, QuantActions also hopes to be able to detect and even predict abnormal brain activity, called epileptiform, and seizures in epilepsy patients. In one study involving 8 patients with epilepsy, researchers collected touchscreen interaction data from the patients’ phones and simultaneously gathered data on their brain activity (electrographic data) from implanted devices (vagus nerve stimulators) that the patients already had inserted to monitor their epilepsy (Duckrow et al.). The researchers then created artificial neural networks to predict epileptiform based on touchscreen interactions. Each patient had a unique AI model made specifically based on their touchscreen data and electrographic data. After the models were trained, researchers continued to collect smartphone data and fed it into the models. The image at this link compares the model’s prediction based on the touchscreen data (black) with the actual electrographic data (purple dotted line) for each patient as two weeks of new phone data came in. For some subjects, the models had questionable accuracy, but for others, like Subjects #1 and #2, the predictions appear quite similar to the real data. Overall, the researchers concluded that “the personalized model outputs based on smartphone behavioral inputs corresponded well with the observed electrographic data” (Duckrow et al.). 

Second, QuantActions is also developing a program for employers and employees. As their usual app does, this platform would use touchscreen interactions to give users (employees) daily scores on their cognitive function and behavior. What differentiates this platform, however, is that, “with the employee’s consent,” employers could “receive anonymized, aggregated data” on the scores of their employees, with the idea that this could help employers detect trends of burnout, stress, and fatigue among their workers (QuantActions).

Apple

Apple, which serves billions of iPhone years globally, is currently conducting two studies to detect digital signals for mental health issues and neurological issues (Winkler). The studies collect data from both smartphones and smartwatches, including keyboard metrics; phone usage; touchscreen interactions; GPS location; video camera, which will be used to reveal “facial expressions;” microphone, which will be used to infer how the user is speaking (not what they are saying); and biometric data from Apple Watches, such as heart rate and steps (Winkler, Kisliuk).

According to the Wall Street Journal, these sorts of data “could give researchers clues about device users’ emotions, concentration, energy level, state of mind and more,” as well as “daily routines” (Kisliuk, Winkler).

Image by IO-Images from Pixabay

One of the studies, which is a partnership between Apple and UCLA, hopes to find digital “warning signs” – also known as digital biomarkers – for depression (Kisliuk). Its pilot phase had 150 participants and happened in 2020, and the main phase has about 3000 patients and began in 2021. The other study is a partnership between Apple and Biogen and aims to find digital biomarkers for mild cognitive decline. The pilot phase of the study, which took place in 2019 and observed 31 adults, showed that those with mild cognitive impairment “exhibited different behavior on their Apple devices than healthy older adults” (Winkler). The main phase, which will have around 20,000 people and last two years, began recently (Winkler). In addition to collecting digital data, both studies will also collect typical forms of diagnosis: for the UCLA study, these include mental health questionnaires and tests on the level of cortisol, a stress hormone, in subjects’ hair. For the Biogen study, these include typical cognitive assessments and brain scans to “track plaque buildup” (Winkler).

If either study successfully finds digital signals for depression or mild cognitive decline, “the hope is to turn those signals into an app or feature that could warn people they might be at risk and prompt them to seek care” (Winkler). 

Image by HelenJank from Pixabay

In addition to its attempts to create features for user devices, Apple is also conducting a study with the goal of creating tools for clinicians. The study, which is a partnership with Duke, hopes to “create an algorithm to detect childhood autism” in young children based on their eye gaze and concentration when doing certain tasks on an iPhone (Winkler). Already, an unrelated study that analyzed 993 toddlers—40 of whom were subsequently diagnosed with autism by a clinician—has shown promise for DP in detecting autism. Using a special app, researchers showed toddlers certain videos while simultaneously using an algorithm to analyze their eye gaze patterns, as captured by the iPhone video camera. The algorithm then compared the eye gaze patterns of the toddlers diagnosed with autism to those without it, and “the app reliably measured both known and new gaze biomarkers that distinguished toddlers with ASD [autism] vs typical development” (Chang et al.). For example, “whereas 6- to 11-month-old typically developing infants visually track the conversation between 2 people, toddlers with ASD showed less-coordinated gaze patterns” during this part of the video. In the future, this app could be used to help doctors diagnose autism (Chang et al.). This study provides a basis to believe that Apple’s study to create platforms to detect autism may prove effective.

iSee by Michigan State University

Image by nikolayhg from Pixabay

Given that suicide is the second leading cause of death on college campuses, it’s no surprise that colleges are now looking into DP; specifically, with a $1 million grant from the National Science Foundation and help from a team at Microsoft, Michigan State University (MSU) is working to develop a platform called iSee (Michigan State University, Zhang et al.). 

According to the Principal Investigator of the study, Mi Zhang, iSee will “leverage the sensors in the smartphone and wristband [smartwatch] to continuously track students’ daily behaviors” (Michigan State University).

iSee will track passive data sources on both smartphones, such as GPS location, ambient light sensor, accelerometer, and touchscreen interactions, and on smartwatches, such as steps, heart rate, and precise sleep data. This raw data will be used to find behavioral information in five categories: activity, travel, diet, sleep, and social. Finally, the behavioral results “will be translated into meaningful analytics results for identifying the student’s depression severity;” for example, the platform will predict the student’s PHQ-9 score on a given day (Zhang et al.). In theory, a clinician at MSU will be able to view both the behavioral data in the five categories and the subsequent depression severity findings for students through a dashboard; the student will have access to the same information through the app (Zhang et al.). This link shows a graphic of how the clinician would view a student’s scores via the iSee platform. On the left, there are the five behavioral categories mentioned: “activity,” “travel,” “diet,” “sleep,” and “social;” on the right is information about the student’s profile (Zhang et al.). This link shows a visual of how the clinician would view the depression severity findings from iSee. As visible in the image, the scores are generated at different time periods, which can show changes over time in student wellbeing (Zhang et al.).

iSee’s creators are in the process of making iSee and bringing it to fruition. In a 2015 study, the researchers found a proof of concept basis: over two weeks, they collected passive phone data on GPS location and phone usage from 28 subjects. At the start of the two-week period, they administered a PHQ-9 survey to each patient.

After analyzing the data and PHQ-9 results, the researchers found that “the same six features (circadian movement, normalized entropy, location variance, home stay, phone usage duration, and phone usage frequency) were significantly different between the participants with no sign of depression (PHQ-9 <5) and the rest (PHQ-9 ≥5)” (Saeb et al.).

Image by FunkyFocus from Pixabay

To put these results in more understandable terms, one of the researchers said that, “People with more severe depression symptoms tended to move from place to place less and stay at home more than people with fewer depression symptoms – or none at all. The movements of more severely depressed people also tended to be less regular, and [they] were more likely to use their phones frequently and for longer durations” (Zhang et al.). Ultimately, the researchers concluded that “features extracted from mobile phone sensor data, including GPS and phone usage, provided behavioral markers that were strongly related to depressive symptom severity” (Zhang et al.). During the study, they also developed a model to classify whether a subject was depressed or not based on the previously mentioned features; the model had an accuracy of 86.5% accuracy (Saeb et al.). iSee highlights another important difference between various DP platforms: this study suggests that comparing an individual’s behavioral patterns to population-level trends may signify depression, whereas platforms like Ginger.io compare the individual’s behavioral patterns to their own previous trends (see Table 2). 

There are many ways that iSee’s creators believe that iSee could improve college counseling services.

First, according to an MSU news bulletin, iSee’s developers hope that the platform could help college counseling centers “‘be more accurately informed with the severity of each student’” through remote monitoring (Michigan State University). Second, iSee could boost “‘the quality of clinician-student communication’” by allowing counselors to check in on their patients between scheduled appointments.

Third, the iSee app itself will give suggestions to students in between formal counseling sessions if their behavior appears concerning. For example, “if iSee notices a socially isolated person is alone at home on a sunny Saturday afternoon, it might suggest calling some friends or going out for a walk” (Zhang et al.). Like other DP platforms, iSee hopes to bring mental health care beyond just formal appointments in a clinic and into continuous monitoring in everyday life. iSee’s developers boost that it could “serve as a model for college counseling centers across the nation,” and therapists and psychologists may eventually desire iSee’s system for their own patients.

Variations in How DP Can Work

There are fundamental differences in how these various DP platforms and apps work or hope to work—and such variations are crucial when considering the ethical implications of each platform and of DP as a whole. 

Table 2: Variations between the DP platforms discussed

PlatformDesigned for patients and usersWho/what analyzes the behavioral inferencesHow the algorithms that analyze behavioral inferences workWho is notified of DP conclusions or behavioral findings
Apple: developing depression and cognitive decline detection featuresUserAlgorithmUser vs. population-level dataUser? 
Ginger.ioPatientAlgorithm: Looks out for large deviations in patient behavior. Patient vs. their own prior trendsPatient and health care professional are notified if the algorithm detects any large deviations in the patient’s behavior. 
CompanionMxPatientHealth care professional: sees patient’s behavioral scores and can make their own conclusion.n/aPatient and health care professional see behavioral scores in four categories. 
QuantActionsUser: anyone who downloads the appAlgorithmUnclearUser
iSeeStudents – may include both users and patients Algorithm and health care professional
Algorithm: hypothesizes a depression severity score (PHQ-9) for each student
Health care professional: can view behavioral inferences and make their own conclusions
Student vs population-level trendsStudent and health care professional see behavioral results and depression severity (PHQ-9) scores in the app and dashboard, respectively. 

The platforms are aimed at different individuals: some platforms, like CompanionMx, aim to monitor individuals who are in treatment for a mental health issue; since the individuals have already been diagnosed by a clinician, I will refer to them as “patients.” Alternatively, other platforms, like Apple’s developing DP, seek to detect mental health conditions in individuals for the first time; since these individuals do not already (or may not) have a clinical diagnosis, I will refer to them as “users.” CompanionMx’s platform and other platforms for patients often incorporate clinicians into their models and aim “to help clinicians help people with mood disorders” (Bender). Alternatively, other platforms, like Apple’s developing system, would “warn people they might be at risk and prompt them to seek care;” these features would prompt users to initiate a relationship with a clinician, instead of operating on the assumption that they are already working with a clinician. 

The platforms also give clinicians a different role in the process of data analysis. All the platforms discussed analyze raw passive data to find behavioral patterns, but each uses a different technique to translate behavioral data into clinically relevant information.

Image by IO-Images from Pixabay

For example, while CompanionMx leaves it to clinicians to analyze the inferred behavioral patterns (which they see through the dashboard), Ginger.io and Apple use algorithms to analyze the inferred behavioral information. Other platforms combine these two approaches: for example, iSee both shows clinicians the patient’s behavioral scores and shows them a depression severity score generated by an algorithm. Among platforms that use algorithms to analyze behavioral patterns, there are different methods of doing so: Ginger.io, for example, compares a patient’s behavioral patterns to their own prior trends, whereas iSee and Apple compare a patient’s patterns to broader population-level trends. These variations, though often minute, can have far-reaching ethical implications for patients, users, clinicians, and society; this paper will attempt to cover many of such ethical implications. 

Though many studies suggest that the field of DP has much promise and potential, these studies also have notable limitations. Many use small sample sizes and have a short time frame: for example, one review article about 25 DP studies found that they used an average of 81 participants for 42 days (Melcher et al.). In addition, many of the studies don’t include healthy controls, are based on convenience sampling, and use samples of just younger subjects. Furthermore, instead of looking at “how all of the data streams may be combined to gain a more holistic view” of an individual’s health and well-being, most of the existing studies focus on detecting or monitoring one specific disease with DP (Melcher et al.). Since many psychological issues have similar symptoms, an algorithm designed for a specific illness may, when used in the broader population, confuse other diseases with the disease it was designed to detect. Because the field is so new, the studies also have inconsistent methodology and data reporting strategies—things that may solidify as the field advances (Melcher et al.).

Image by Mohamed_hassan from Pixabay

In addition to limitations on DP studies, there are technical challenges when creating DP platforms. First, large-scale, ongoing data collection will drain phone battery significantly. For example, continuously gathering GPS location data “can drain the phone battery in a couple of hours,” which is part of the reason that apps like Waze and Google Maps tend to drain phone batteries so much (Onnela). To avoid draining someone’s phone battery too much, DP platforms will often sample data at certain intervals, like gathering the user’s GPS location every 10 minutes (Onnela). Second, DP algorithms assume that a phone is a perfect proxy of its owner, but this is not always true. This underlying assumption may cause errors: for example, if the GPS location of the phone is the user’s home all day, this could indicate the user didn’t leave their house (potentially a sign/symptom of a mental health episode), but could also indicate that the user left their phone at home (Onnela). Third, platforms that compare an individual’s patterns to population-level trends may be inaccurate because people use their phones differently and experience mental health episodes differently; for example, some depressed patients experience insomnia whereas others oversleep (Sawchuk). Alternatively, platforms that compare an individual’s patterns to their own prior routines, like Ginger.io, may be able to detect deviations in the user’s behavior, but cannot determine the cause for such deviations: a departure from usual routines could signify a mental health episode or relapse, but it could also signify that the user is on vacation, took a day off from work, had a physical illness like the flu, began a new job, moved houses, or more. Fourth, DP gathers a massive amount of data from users and patients, and much of this data is noisy, or meaningless. A challenge for researchers is sifting through all this data and differentiating noisy data from valuable trends with a connection to health. 

Government Regulations

Though there are no government regulations on DP itself, DP platforms may be impacted by government regulations on (1) the efficacy of medical devices and (2) digital data. In this section, I will explain existing regulations in these areas, largely focusing on US regulations, and how they may cover or control DP.

Regulating Efficacy

Image by Megan_Rexazin from Pixabay

In the US, the Food and Drug Administration (FDA) is tasked with, among other things, ensuring “the safety, efficacy, and security” of  “medical devices” (“Food and Drug Administration”). To respond to the growing number of digital health tools, the FDA ran a Digital Health Software Precertification Pilot Program from 2017 to 2022, which included Apple, Fitbit, and seven other companies (“Digital Health Software Pilot Program”). It also launched the Digital Health Center of Excellence in 2020, which, among other things, approved Apple Watch features to detect atrial fibrillation (“FDA Launches Digital Health Center”).

With regard to smartphone apps, the Federal Trade Commission states that the FDA “regulates the safety and effectiveness” of “certain mobile medical apps:” specifically, those “that could pose a risk to consumers if they don’t work as intended” (“Mobile Health App Interactive”).

Given that the FDA considered the atrial fibrillation detection feature on Apple Watches to be within its jurisdiction, I believe that it would likely consider any more advanced DP features on Apple iPhones or DP apps to also be within its realm; furthermore, at the launch of the Digital Health Center of Excellence, the FDA vowed “dedication to the advancement of digital health technology, including mobile health devices, Software as a Medical Device (SaMD), wearables when used as a medical device, and technologies used to study medical products” (“FDA Launches Digital Health Center”). As a result, it seems likely that the FDA would seek to regulate the efficacy and safety of DP platforms to ensure that they don’t harm consumers.

Regulating Data

Image by TheDigitalArtist from Pixabay

In addition to regulating DP platforms for their efficacy, the government may (or may not) seek to oversee how these platforms collect, use, and store consumer data. Whereas many European countries, like the European Union, have robust data regulations to protect consumer privacy, America’s regulations are much looser: they cover “only specific types of data in special circumstances” (Klosowski). One of these “specific types of data” and paired “special circumstances” is personal health information (PHI) when it is handled by certain “covered entities:” the Health Insurance Portability and Accountability Act (HIPAA) stipulates that “covered entities”—such as doctors, hospitals, pharmacies, and insurers—can only share PHI when it is “for the purposes of treatment, payment, and health care operations and when a business associate agreement is in place” (Klosowski and “Health Insurance Portability and Accountability Act”). 

Though “people tend to think HIPAA covers all health data, it doesn’t” (Klosowski). For example, at present, health data collected and stored by Fitbit watches are not covered under HIPAA, since the company Fitbit is not and does not involve covered entities (Klosowski).

As a result, HIPAA may regulate some, but not all, DP platforms. Any DP data that is handled by a covered entity would fit under HIPAA’s jurisdiction; for example, if CompanionMx reveals PHI to the clinician, the clinician would be considered a covered entity and therefore could not share the PHI except for the specific reasons enumerated above. However, DP may “create sensitive health information outside of contexts covered by HIPAA” when companies that are not covered entities and do not involve covered entities are collecting and storing the data (Martinez-Martin et al.). For example, like data collected on Fitbit watches is not regulated by HIPAA, DP data collected by companies like Apple and QuantActions would not be regulated by HIPAA, since these companies are not covered entities. 

If there are no new regulations put in place for digital health data, these companies—like the many businesses in the US who fall outside of HIPAA and other niche data regulations in the US—will be “free to do what they want with [consumer] data” and can “use, share, or sell any data they collect” without notifying users (Klosowski).

For example, one study found that “52% of apps share [user] data with third parties” and that Instagram “shares a staggering 79% of [user] data with other companies,” including data on “search history, location, contacts and financial info (Dimitrov and Cuthbertson).

These other companies, known as third parties, can then further share the user’s data without regulation (Klosowski). DP platforms that fall outside HIPAA’s jurisdiction could share data in similar ways. Some US states, partially inspired by the EU’s stricter data regulations, have enacted their own laws to protect consumer data and privacy: California, Virginia, and Colorado now have “comprehensive consumer privacy laws” and Massachusetts, New York, North Carolina, and Pennsylvania are all in the process of making similar laws (Klosowski). This image shows privacy legislation (or the lack thereof) in various US states.

Ethics: Introduction

Image by Peggy_Marco from Pixabay

As shown above, DP is a viable possibility and is already being used by some apps and doctors. The remainder of this paper will explore the question: is DP with mainly passive data ethical? Here are key considerations that I believe are relevant to this question and will discuss below: In the first section, which is called potential benefits and efficacy, I discuss the possible benefits of DP when compared to typical care. Though doctors administering typical care may use patient self-reports to try to predict mental health episodes or gain a sense of how the patient is doing between sessions, DP would allow doctors to do such things without the patient needing to fill out self-reports. However, the benefit that a given DP offers is dependent on its efficacy and accuracy in accessing patient behavior and/or mental state. The second section, called privacy and autonomy, explores how these possible benefits come at a high cost: privacy and/or autonomy. This section discusses privacy and autonomy from the perspective of a user, patient, employee/student, and minor. It considers the various types of DP and how they each affect privacy and autonomy differently, and how DP may offer the possibility for “forced” monitoring or usage. The third section, called slippery slope, explores how DP could assist totalitarian governments in population surveillance because it creates the technology to infer a user’s behavioral patterns and mental state based on their phone data. The next section, called fairness and equity, explores the potential biases that could arise from DP since it relies on algorithms and AI, which could have biases depending on who is (or isn’t) included in the studies that the models are made from. The last section, called trust, discusses how DP may reduce a doctor’s trust in their patient and a patient’s trust or user’s trust in their own self-awareness. In the conclusion, I will recap the ethical points made, discuss my own thoughts on DP’s ethicality, and make recommendations for ways that the US government can appropriately regulate DP.

Ethics: Potential Benefits and Efficacy

A patient considering whether to use a DP platform like CompanionMx or Ginger.io to assist their work with a therapist faces the question: Are the potential benefits of these platforms for treating and monitoring a mental or neurological illness worth sacrificing aspects of one’s privacy?

Both the potential benefits and the privacy violations stem from the fact that DP allows for continuous tracking of the patient’s behaviors and symptoms in situ, or in the patient’s daily life.

On the one hand, this may be seen as an improvement when compared to typical care, which often relies on periodic check-ins between the patient and their doctor in a clinical setting. For example, Laura Sullivan—a social worker who used CompanionMx during Brigham and Women’s trial of the app—felt that the daily tracking from DP was beneficial as she worked with patients. She said that, “historically, we [clinicians] would never know what happens to patients between sessions,” but with the CompanionMx app, Sullivan was able to see how a patient fared between appointments, which “sometimes” led to a “deeper [and] richer conversation” during the sessions (Bebinger). For example, Sullivan said that, after “watch[ing]” a patient’s trends via CompanionMx, she may ask during a session, “‘I noticed, during this time period, things [scores] changed for you. What does that mean, what was going on?’” (Bebinger).

Photo by StockSnap from Pixabay

The story of a patient referred to as Client B exemplifies how, by continuously monitoring patients in situ, DP can have benefits. Client B was diagnosed with “moderate to moderately severe depression” and “severe anxiety,” and had a history of suicidal ideation. Though he “worked with several therapists throughout his life,” they “all [had] little effect,” and in the first three months of treatment with the new clinical team, “client B appeared to be presenting with treatment resistant depression.” At this point, the clinical team explained the mindLAMP app to client B, who agreed to begin using the app while continuing weekly therapy sessions with the clinical team (Rauseo-Ricupero et al.). Among other things, mindLAMP collects various passive data streams – such as GPS location, call logs, text logs, Bluetooth, and app usage – and uses them to characterize the user’s behaviors – including their sleep patterns, sociability, mood, and mobility. The mindLAMP dashboard allows the patient, their clinician(s), and certain family members who the patient authorizes to view the patient’s inferred behavioral patterns (“MindLAMP” and “How Does MindLAMP Work”). After about one month after Client B began using mindLAMP, his clinical team made a pivotal observation based on the platform’s behavioral inferences:

When the clinical team “review[ed] additional captured data (sleep log) during session, it became clear that Client B had severe sleep issues, though the client initially reported no sleep issues” (Rauseo-Ricupero et al.). The clinical team then “reflect[ed] on discrepancies between app and [Client B’s] self-reported summaries in a manner that facilitated a pivotal discussion” that confirmed Client B’s sleep issues.

After this discussion, the clinical team “referred Client B for testing for Obstructive Sleep Apnea (OSA), which returned positive” (Rauseo-Ricupero et al.). As Client B began treatment for OSA, his mental state also improved, as his sleep issues were exacerbating his psychological challenges. The study’s authors stated that Client B’s “mood and anxiety scores were expected to continue decreasing with further OSA treatment” (Rauseo-Ricupero et al.).

As Client B’s case demonstrates, DP has the potential to improve mental health treatment for patients by revealing their behaviors in between sessions to their health care provider. Additionally, with its continuous data collection, DP may also be able to predict mental health episodes before they begin, particularly in patients with conditions like schizophrenia, epilepsy, and bipolar disorder. Importantly, it could be a doctor or an algorithm analyzing patient behavioral data found via DP that makes this prediction.

An example of the former is that, in a study at a Harvard teaching hospital, “clinicians used insights from the Companion dashboard to predict mania up to two weeks in advance” (Bender).

Image by OpenClipart-Vectors from Pixabay

For users, Apple’s DP features may have benefits: they may notify users of a mental health illness or neurological challenge that they weren’t otherwise aware of, which may allow them to initiate treatment.
However, the potential benefits of DP are contingent on its efficacy and accuracy: DP platforms have less potential benefit if they have low efficacy and low accuracy. In addition, false positives for features that detect aspects of psychological or neurological health could be quite harmful. Some current health-related features have made mistakes, but because these features are related to physical health, the false positives are not as harmful. For example, the car crash detection feature on Apple Watches and iPhones sometimes called 911 for skiers, as the crash detection algorithms confused the fast motions, quick jolts, and sudden stops of skiing with those of car crashes. Since these features are related to physical health and trauma, it is fairly simple to tell when they are wrong: to the skiers, it was obvious that they were not in a car crash and that the Apple feature made a mistake. However, if Apple implements features focused on mental and neurological health, mistakes might not be as obvious to users who receive a false positive notification. As a result, healthy users who are false positives and receive a notification that they might have depression may face a “placebo effect” of sorts, where the notification makes them start to feel worse. Furthermore, like the car crash detection feature confused skiing with a car crash, a platform like Ginger.io might confuse a vacation or a day off from work/school with a mental health episode.
In summary, this section introduces the conflict central to evaluating DP’s ethicality: are the potential benefits of DP compared to typical care worth sacrificing one’s privacy? DP can continuously monitor a user and infer their behavioral patterns, which creates both benefits and privacy concerns. This section explores the possible advantages of DP: first, that it can reveal a patient’s behavior in between sessions and in situ, or in daily life. For example, with Client B, the use of DP revealed his sleep issues to his doctors, which led to an OSA diagnosis and was pivotal in treating his depression. Second, DP may be able to predict or reveal signs of mental health episodes before they begin; for example, in a trial of CompanionMx, Companion’s behavioral inferences enabled doctors to predict episodes of mania before they began. However, these potential benefits are contingent on how effective and accurate DP platforms are; if they are inaccurate, they may instead cause harm. The next section will go more in-depth on the risks that DP presents to privacy and autonomy.

Ethics: Privacy and Autonomy

Image by TheDigitalArtist from Pixabay

As DP algorithms advance, they offer individuals and institutions—ranging from doctors, whose motives (hopefully) are to help the patient, to large companies, whose motives may lie elsewhere—the capability to “watch” the behavior and health of patients and users through their passive phone data, which can be collected without any input from the user. This section explores how DP may violate the privacy and autonomy of a user—an individual who uses a DP intended to detect illness and who is not working with a doctor—and a patient—an individual who uses DP intended to monitor and help treat a known illness and who is working with a doctor. This section will also discuss privacy and autonomy from the perspective of a minor and of a student or employee. The ideas posed in this section are based on the assumption that the patient, user, minor, or student/employee is in the US, as digital regulations in other countries are beyond the scope of this paper. 

In the US, many companies and apps use privacy policies to gain consent to collect, use, and share consumer data. However, these often have little effectiveness in teaching consumers and gaining informed consent, since they are often dense, lengthy, and difficult to comprehend.

One study from Pew Research Center found that 9%, 13%, and 38% of Americans reported that they read privacy policies always, often, and sometimes, respectively; the remaining 36% reported that they “never” read privacy policies. Of the 60% who said that they ever read privacy policies, 22% and 35% reported reading them entirely or partially, respectively, and the remaining 43% reported “glanc[ing] over [them] without reading closely” (Pew Research Center). Of the same 60% who ever read privacy policies, 13% reported understanding them “a great deal,” while 55%, 29%, and 3% reported understanding them “some,” “very little,” and “none” (Pew Research Center). Because privacy policies are so often ignored, skimmed, or incomprehensible, the “consent” that they offer is often not informed consent.

Privacy and Autonomy for the Patient

As a patient considers downloading and using a DP platform, they will likely be faced with a privacy policy. If the patient is using a platform like CompanionMx, their clinician will be involved in the DP monitoring (will be able to view their behavioral scores on a dashboard), and so this clinician may be able to guide the patient through the process of downloading the app and understanding its privacy implications.

In the case study that included Client B, the authors note that “in introducing the app to all three clients, each wanted to understand how their privacy would be protected and who would have access to their data” (Rauseo-Ricupero et al.).

Image from Wikimedia Commons

Because the clinical team considered it “important” to “establish trust around the app,” they “outlin[ed] types of data that mindLAMP collected, review[ed] how and where the data was stored [and] who had access to patient data, show[ed] patients how to access their own data at any time, and [explained] when and how the gathered data could be deleted” (Rauseo-Ricupero et al.). In addition, the clinical team who helped Client B enrolled a “technology specialist” to answer any patient questions regarding data collection and their privacy. Though these clinicians took stringent protocols to ensure informed consent and to protect patient privacy, not all clinicians may do so. Furthermore, because privacy policies are grounded in technical language and niche knowledge, not all clinicians may be able to understand them and help “translate” the privacy policies to their patients.   

After the patient begins to use the DP app, much of the discussion surrounding privacy depends on intricacies in how the app works, how the clinicians use it, and how it stores data.

Does the platform reveal behavioral patterns to the patient and clinician in real-time, or in set intervals? If the former, do the clinicians look at the data frequently, or only in set intervals? In the case study with Client B, for example, the clinical team only “review[ed] collected app data during clinical sessions” to build the patient’s trust in both the app and the clinicians (Rauseo-Ricupero et al.). How is the data being sent and stored? The ways that data can be sent, encrypted, and stored, and related privacy concerns such as hacks and breaches, are beyond the scope of this paper. However, it is still vital to consider that the transmission and storage of DP data presents a challenge to a patient’s privacy. Will the data be deleted after a set period of time, and if so, after how long? What happens to a patient’s behavioral data after they die? Should it be deleted, or could it be used for further research into DP? These and other questions about the logistics and technicalities of DP platforms will influence their ethicality.

In addition to privacy concerns, there are also concerns regarding autonomy: What if a patient wants to use DP, but their doctor doesn’t want to? Can a patient “require” a doctor to use CompanionMx or a similar platform? Conversely, imagine a patient with depression who has suicidal ideation (SI). What if their doctor wants to “require” the patient to use DP, so they can continuously monitor the patient? Which is more important: the patient’s autonomy and privacy, or the desire to keep them safe?

Privacy and Autonomy for the User

Even after all those privacy concerns, privacy for the user seems much more Orwellian than privacy for the patient. Many of the questions and privacy concerns mentioned above (when discussing the patient), such as the transmission and storage of data, are still relevant for the user; as a result, this subsection outlines additional concerns specific to the user. 

If Apple’s developing features—one to detect depression and another to detect mild cognitive decline—are ever implemented into Apple devices, many of the privacy concerns will, just like with the patient-focused DP platforms, depend on the logistics and technicalities of the features.

Will Apple use opt in or opt-out consent? If the latter, will users even be aware that such features on their devices? When considering health-related tools on current Apple devices, Apple Watches have a fall detection feature, which can be turned on always, only during workouts, or never. The feature is, by default, on always for those aged 55 and older and on only during workouts for those aged 18-55; users can change this setting via the Watch app (Apple, “Use Fall Detection”). Apple Watches also have two handwashing-related features, which are off by default (Apple, “Wash Your Hands with Apple”).

Image from Trusted Reviews

In addition, Apple Watch Series 8 and iPhone 14s have a feature to detect car crashes, which is on by default (opt-out consent) (Apple, “Use Crash Detection”). Given that some of these features are on by default and others are not, it is unclear how Apple would treat its DP features if they are implemented onto devices. As discussed above, users rarely read and understand privacy policies when downloading an app, purchasing a new device, or updating a current device. As a result, many individuals don’t know what changes are made when they update their phone or purchase a new phone, especially because many people in the US choose to buy an iPhone simply because it is the norm. Even if DP features are mentioned in a privacy policy, then, they may still slip under the radar, since few readers read privacy policies.

Image by viarami from Pixabay

Regardless of whether users are aware of these features, any conclusions they draw present even more privacy concerns and may be harmful to users. If the current studies with UCLA and Biogen reveal any digital signals of depression or cognitive decline, Apple “hope[s] to turn those signals into an app or feature that could warn people they might be at risk and prompt them to seek care” (Winkler). If Apple plans to “warn people they might be at risk” with an iPhone notification, as they do for Apple Watch’s health features, users may find these warnings impersonal and insensitive. Given the user’s prior life experience, receiving such a notification could be particularly upsetting: for example, for a user who has witnessed their parent deteriorate due to Alzheimer’s, seeing a notification that they may have mild cognitive decline through their phone—without a doctor present and without mental preparation to receive a diagnosis—could be very upsetting and harmful. In addition, if a user suspects that they suffer from depression or mild cognitive decline, but doesn’t seek diagnosis or treatment because they can’t afford it, getting a notification that may confirm their suspicion would also be harmful. 

Beyond just “warn[ing] people they might be at risk,” Apple plans to “prompt them to seek care,” which brings its own set of ethical implications. The study to create the DP tool to detect mild cognitive decline is a collaboration between Apple and Biogen; according to the Wall Street Journal, “Biogen is collaborating on the study because it hopes it can help Apple develop an iPhone feature to detect mild cognitive impairment early and encourage relevant users to seek care earlier” (Winkler). Mild cognitive decline can develop into Alzheimer’s disease, and in 2021, the FDA approved Aduhelm, a drug created by Biogen to treat early-stage Alzheimer’s. The approval of Aduhelm was quite controversial for various reasons, and many suspected wrongdoing during the approval process. A congressional investigation found that the “F.D.A.’s approval process for Aduhelm was ‘rife with irregularities’ and criticized Biogen for setting an ‘unjustifiably high price;’” reports suggest that Biogen raised the price of the drug “because it wanted a ‘blockbuster’ that would ‘establish Aduhelm as one of the top pharmaceutical launches of all time’” (Belluck).

This information indicates that Biogen may have a conflict of interest when designing algorithms to detect mild cognitive decline: Biogen may be motivated to collaborate on this algorithm not with the desire to help patients, but instead with the desire to sell Aduhelm to a relevant audience. 

After Apple “prompt[s] [users] to seek care,” it is unclear what will happen next: Will Apple continue to perform DP on the user to see if the algorithm detects any improvement in their symptoms? If the algorithm detects no changes after a certain period of time, will Apple notify the user again? This would breach user privacy and could exacerbate the psychological damage they felt after the first notification, especially if the user is unable to afford medical care but may desire it. If Apple notices no changes on the part of the user, could they also notify an emergency contact of the user? Doing so without the user’s approval evokes aspects of attempting to force treatment and would violate the user’s privacy.

Privacy and Autonomy for the Student and Employee

While the patient and user interact with the DP system as an individual, some developing applications of DP are designed for groups: for example, iSee’s platforms will allow college counselors to monitor students. iSee’s technology creates a new relationship beyond just the relationship covered by the patient or user, since college counselors have a very different relationship to their student body than a clinician has to their patient or a company has to their users.

The creators of iSee boast that their technology will “leverage the sensors in the smartphone and wristband [smartwatches] to continuously track students’ daily behaviors,” since this will “help clinicians identify students with the most urgent needs” and “will allow college counseling centers to be more accurately informed with the severity of each student’” (Michigan State University).

Image by Chanut Is industries from IconScout

The idea of college counselors “continuously track[ing] students’ daily behaviors” is quite frightening. Would students be able to opt-in and out-out of such tracking, and if so, how? Making this monitoring the default for students or even not allowing students to opt out at all puts the college counseling service into the position of “Big Brother” and would violate student privacy and autonomy. At the same time, though, there are approximately 1,100 suicides on college campuses annually, making suicide the 2nd leading cause of death for college students in the US (UMich Counseling). As such, technologies like iSee create a conflict between a student’s desire for privacy and the desire to prevent suicide and mental health struggles on college campuses. 

Like how iSee hopes to monitor groups of students with DP, QuantActions is developing a platform for employees and employers to use DP. QuantActions boast that its “daily scores and trends objectively [would] help both employees and the employer understand the impact of health measures and support early detection of conditions that affect cognitive performance” (QuantActions). To try to ensure employee privacy, “employers [would] receive anonymized, aggregated data, and only with the employee’s consent” (QuantActions). In the future, other companies and platforms may not use these safeguards, which could jeopardize employee privacy. 

Privacy and Autonomy for the Minor

Image by 46173 from Pixabay

Whether a patient, user, or student, minors further complicate questions about privacy and autonomy. At what age should a child be allowed to give consent to DP monitoring? If a parent is giving consent, should the child have to assent? Or can a parent force their child to be monitored with DP against their will? With an app like CompanionMx, should the child, parent, or both be able to view the patient’s behavior scores? If DP algorithms or apps detect problematic behaviors or health issues in a minor, should the child, parent, or both be notified? Like with DP for college students, performing DP on minors creates a conflict between the child’s right to privacy and a need to care for a depressed or suicidal teen. Fundamental to DP is creating trust in the platform and in the clinical team: in the case study about Client B, the clinical team found that “building trust around the app [is] critical” to success (Rauseo-Ricupero et al.). However, if a minor knows that their parent will be able to see their behavioral patterns, they may resist the technology or not trust it; they may even try to alter their phone usage to hide signs of a disease, which undermines the purpose of the algorithms in the first place. 

In summary, this section discusses the ways that DP may violate privacy and autonomy by allowing doctors and institutions to continuously track an individual’s behavioral patterns through the person’s passive phone data. This section looks at privacy and autonomy from the perspective of a patient, who is being monitored to treat a known illness with the guidance of a clinician, and a user, who is being monitored to detect an illness that they are not currently working with a clinician to address. Major concerns around patient privacy and autonomy include that the patient may not read or understand a privacy policy; various technical questions about how DP data is transmitted, stored, and kept (or deleted) in the long term; and the desire of a patient or clinician to “require” the other to partake in DP. The user faces these privacy and autonomy concerns and many others, such as not knowing about a DP feature on their device if the feature uses opt-out consent; sustaining psychological harm after receiving a health-related warning through their phone; and being recommended certain medications by the DP system. Next, this section explored privacy and autonomy for a college student monitored by iSee and an employee using QuantAction’s employee and employer program, who may feel that their university or company is trying to infringe on their privacy under the guise of assisting them medically. Lastly, this section looked at privacy and autonomy from the perspective of a minor, such as whether the minor should consent or assent to DP and who should be notified of the minor’s DP scores and metrics.

Ethics: Slippery Slope

As DP algorithms to infer an individual’s behavioral patterns based on their passive phone data advance, they offer the potential to improve health outcomes, but also create a way for totalitarian governments to “watch” individuals—not through telescreens, as was done by Big Brother in 1984, but through their phones.

Without “careful consideration, an approach that was developed for medical management [DP] could become a tool for population surveillance” (Martinez-Martin et al.).

Image by PhotoMIX-Company from Pixabay

Totalitarian governments are already taking advantage of novel technologies to monitor their populations. For example, China’s government now uses a “rapidly expanding network” of surveillance cameras and “advanced facial recognition technology” in order to “track and control the Uighurs, a largely Muslim minority” (Mozur). The facial recognition technology, which uses AI models, “looks exclusively for Uighurs based on their appearance and keeps records of their comings and goings” and “screen[s] for people included in databases of the mentally ill or crime suspects” (Mozur). Since China’s existing facial recognition look for those who are “mentally ill,” it seems that they may be interested in DP technologies as a way to learn about individuals’ mental states, as well as their behaviors. Passive phone data can be collected in the background, without the user knowing, and developing DP algorithms are able to translate this passive phone data into information about a user’s behavioral patterns and mental state. As a result, the Chinese government could advance their population surveillance by creating (or purchasing) DP algorithms and tools and using them to learn about members of their population through their phones while citizens are unaware of the tracking. Could they use sleep inferences or cognitive function inferences to determine who will make the best employee? Could they use mental health inferences to make decisions about individuals’ features? Other totalitarian governments may also seek to do the same. In summary, this section explored the dystopian possibility that totalitarian governments may create their own DP algorithms to detect behavioral patterns of citizens and thereby carry out secretive population surveillance.

Ethics: Fairness and Equity

Because DP relies on algorithms, machine learning, and AI, it may have bias in data analysis, which has shown up frequently in the medical setting. For example, racial bias frequently appeared in AI algorithms used in healthcare and facial recognition. For instance, in 2019, a study published in Science revealed that “one widely used algorithm” to give a score to a patient’s level of sickness has racial bias. The algorithm, which is commonly used to determine which chronically ill patients can get access to high-risk healthcare management, affects approximately 200 million patients per year. With this algorithm, “black patients have to be much sicker than White patients to be recommended the same level of care” (Grant). This is because the model predicts the patient’s expected costs and uses this as a proxy for the severity of their illness, but black patients tend to spend less on healthcare than white patients despite being equally ill (Grant and Obermeyer).

Image by Nick Youngson from Alpha Stock Images

For DP algorithms that compare individuals to population-level trends, bias could easily arise against older individuals: different age groups tend to use their phones differently, but many of the DP studies use primarily young subjects. Additionally, algorithms based on population-level data may ignore patients with irregular symptoms of a disease, which could further their barriers to care.

What is even considered “regular” and “irregular” might be biased depending on how diverse sample groups are in terms of race, gender, ethnicity, background, socio-economic status, age, job, and lifestyle. 

Furthermore, data from microphones and cameras are a potential source of bias for platforms like CompanionMx (microphone) and Apple’s developing tools (microphone and camera). Male and female voices often differ in pitch, which could impact the algorithms that analyze voice signals to determine how someone is speaking, and different languages sound distinct and seem to convey different moods; for example, Chinese is frequently considered to sound angry or aggressive. As a result, tools that infer mood based on voice data, like CompanionMx, may only work for English speakers. Furthermore, prior facial recognition algorithms have been shown to be biased based on race, and a similar bias could arise in Apple’s possible algorithms to analyze facial expressions. As an alternative to contributing to bias, DP may help undo bias by shedding light on mental health illnesses in men. Men frequently hide mental health illnesses because of stigma and doctors may ignore signs of depression or anxiety in men, yet men make up about ¾ of suicide deaths. As a result, by looking into individuals’ lives, DP could help raise awareness about mental health issues among men. In summary, because DP depends on algorithms and AI, it could have biases against certain groups, largely depending on who is (or is not) included in the preliminary studies that models are made based on.

Ethics: Trust

Image by geralt from Pixabay

Digital P could affect the trust between the clinician and their patient, and could weaken a patient’s trust in their own self-awareness. Especially for psychiatric illnesses, effective treatment relies on a trusting relationship between the patient and clinician, whether this is a psychologist, psychiatrist, therapist, or any other health care professional.

Discrepancies between DP inferences and patient self-reports may threaten this trust, yet also may have benefits for the patient’s wellbeing.

The latter occurred, for instance, in Client B’s situation: The “discrepancies between app and self-reported summaries” on the patient’s sleep log led to a “pivotal discussion” where Client B and his clinical team examined the discrepancies. As a result of the sleep log from the DP and the discussion, Client B was diagnosed with OSA and began treatment for OSA, which helped improve his mental state. In addition, because various studies suggest that self-reporting and human memory is flawed and subjective, doctors may enjoy having DP inferences to supplement self-reports (Robson). But over time, given the fallibility of human recall, doctors may begin to trust DP inferences—which are often seen as “objective,” even though they too can be erroneous and flawed—more than they trust patient self-reports, which could break the trust that otherwise would bind the patient and doctor and contribute to successful treatment. 

In addition to affecting the trust between a patient and clinician, DP could lessen a patient’s or user’s trust in themselves. If a DP platform like CompanionMx tells a patient they are at a certain score for their mobility, mood, or social activity, they may begin to trust the app’s scores more than they trust their own self-awareness about how they are doing in these areas. In addition, because human memory can be short and malleable, the app’s reports may influence the patient’s retrospective emotions about a given day or event. In addition, if Apple’s features come to fruition, a healthy user who receives a notification that they may have depression may experience the placebo effect, where the notification makes them question their well-being and feel worse. In summary, this section addresses how DP may weaken the trust between a patient and their clinician and may degrade a patient’s trust or user’s trust in their own self-awareness, which could be detrimental in the process of treatment.

Conclusion

Summary of Ethical Points

Image by Peggy_Marco from Pixabay

This paper explored whether DP using passive phone data is ethical. The first ethics section introduced the central tension of DP: are its potential benefits worth sacrificing one’s privacy? This section explored the possible advantages of DP when compared to typical care, such as predicting mental health episodes before they occur and allowing a doctor to monitor a patient between sessions and in situ. Client B’s story exemplifies these benefits, since DP revealed his sleep issues to his clinicians, leading to an OSA diagnosis and improvements in his mental state. The second section, called privacy and autonomy, explored how DP may violate privacy and autonomy for a user, a patient, a student/employee, and a minor. Among other things, this section discussed the inadequacy of privacy policies; fundamental questions about how DP data is stored, transmitted, and dealt with in the long-term; ways that users may be unaware of and harmed by DP algorithms and any warning that they send; and the potential for “forced” DP monitoring for an individual. The third section, called slippery slope, delved into dystopian possibilities: since DP algorithms translate passive phone data into information about a user’s behavioral patterns and mental state, they could assist totalitarian governments in population surveillance. The fourth section, called fairness and equity, discussed how DP may be biased depending on who is (or isn’t) included in preliminary studies that are used to make the models because it relies on algorithms and AI. The last ethics section, called trust, explored how DP could degrade the trust between a patient and their doctor and even the patient’s trust or user’s trust in themselves.

Recommendations

I believe that it is ethical to use DP for patients—to improve patient-clinician work to treat or predict episodes of an already-diagnosed illness—if a patient gives informed consent. For example, I view what CompanionMx or Ginger.io hope to do as ethical. However, I believe that it is unethical to use DP in the user: to detect a disease and nudge the user to seek care. 

To sufficiently protect the privacy of patients, I believe the US should create the following regulations on DP platforms and their privacy policies, some of which are based on the General Data Protection Regulation (GDPR) in the EU.

First, the privacy policies of DP platforms must inform patients of the “types of data collected, the inferences that can be drawn from the data, the reports made from the data, who the data and reports would be shared with, the potential risks and benefits, and the limitations that apply to the findings” (Martinez-Martin et al.).

Image by IO-Images from Pixabay

Second, this information must be specific and unambiguous: the data sources and behavioral patterns inferred from them should be explicitly named. Currently, the GDPR in the EU stipulates that the statement “we may use your personal data to develop new services” is inadequate, since it is too vague. Instead, GDPR tells companies and organizations to say something like, “‘we will retain your shopping history and use details of the products you have previously purchased to make suggestions to you for other products which we believe you will also be interested in’” (“Writing a GDPR-Compliant Privacy Notice”). I believe that the US must create similar requirements for DP to ensure that the privacy policies for DP platforms are explicit and clear when listing the types of data collected and the inferences drawn from them. Third, the GDPR stipulates that certain organizations, such as those who perform “regular and systematic monitoring of personal or sensitive data on a large scale,” must have a Data Protection Officer (“History of GDPR”). Based on this idea, I believe that companies that perform DP must have a Data Protection Officer or a similar type of official to ensure patient privacy. Fourth, as is required by the GDPR, companies that use DP must ensure data protection rights and explain them in the privacy policy; some of these rights include the right to request a copy of their data from the company (right to access) and the right to request that the company erase their prior data (right to erasure). Fifth, to begin using a DP platform or DP feature, patients must opt-in at least two times that are at least a week apart, which would force them to take the privacy policy of a DP platform more seriously than they may take other privacy policies. Sixth, as mandated by the GDPR, privacy policies for DP must list all information should be listed in a  “concise, transparent, intelligible, and easily accessible,” and they must be “written in clear and plain language” (“Writing a GDPR-Compliant Privacy Notice”). Seventh, privacy policies for DP should come in both written form and in the form of video explanation, as these will boost the patient’s understanding. Eighth, in addition to the patient agreeing to the privacy policy, their healthcare provider must also agree to the privacy policy, especially to see its information on its “potential risks and benefits, and the limitations that apply to the findings” (Martinez-Martin et al.). In conclusion, above are various measures that I think the US government should mandate that DP platforms and companies take in order to ensure consumer privacy regarding their digital data.

Image by Wallusy from Pixabay

In addition to the mentioned regulations for consumer privacy, I also believe that the US should take measures to regulate the efficacy of DP platforms. As discussed above, this task would currently fall under the jurisdiction of the FDA if it concludes that DP apps “could pose a risk to consumers”  (“Mobile Health App Interactive”). Personally, I think that DP platforms could do so, and so I think the FDA should monitor DP platforms for efficacy. Clinical trials for DP platforms must have both an International Review Board (IRB) to ensure that subjects give informed consent and are partaking voluntarily, and a Data Safety Monitoring Board (DSMB) to monitor the platform’s efficacy during the study (“Fact Sheet: Data Safety Monitoring Boards”). The subjects enrolled in clinical trials for DP apps must be of diverse backgrounds, demographics, and lifestyles to help prevent DP platforms from developing bias and only working for a certain type of person. The FDA’s regulatory review boards for DP technologies must include technological experts, data security experts, medical experts, regular users, and patients who have or have had mental health issues; these review boards must be independent entities and not connected to or funded by the companies in question. After the DP platform has been approved and released to the public, the FDA must have review boards periodically evaluate the technology’s performance and efficacy. If this review board determines that the DP platform is not effective, is not working as intended, or has flaws or biases, then it should address these issues with the company or revoke the platform’s approval (whichever is warranted depending on the scope of the issues).

Tags
About The Author

Leave a reply

Your email address will not be published. Required fields are marked *