White Paper

COVID-19 Tools
November 11, 2020


Covid activity assessment


As our experience with the novel coronavirus (SARS-CoV-2) has grown, scientists and clinicians have developed a better understanding of how the disease spreads. Several studies have examined modes of transmission, and epidemiologists have published articles analyzing risk factors inherent in everyday activities. The MyDataHelps digital health platform leverages this information to help users make informed choices about the activities they participate in.

Activity transmission risk

The app focuses on activity-specific risk of viral transmission. For example, is going to the bar riskier than going to the park, all else being equal? How much riskier, and why? 

Deliberately excluded are factors specific to the person or the community, including:

Tools like this can provide people with a much better idea about how much risk is associated with the things that they’re going to do.

Emily Landon

While these factors are important to an individual’s overall risk, conflating them with the activity can undermine the user’s appreciation of activity-based risks. To an individual in a hot spot with a compromised immune system, almost everything might pose significant risk; there is nevertheless value in understanding which activities are riskier than others.

By focusing on activity-specific risk, we empower users. They cannot control whether their community is a hot spot or whether other people will wear masks, but they can control which activities they choose to participate in and be aware of the risks they are taking. According to epidemiologist Emily Landon(20), tools like this can provide people with, “a much better idea about how much risk is associated with the things that they’re going to do.”

App workflow

The app prompts the user to select an activity, such as going to the bar or grocery shopping, and then displays a risk profile for that activity. This profile includes:

  1. An overall activity transmission risk score of Low, Medium Low, Medium, High, or Very High.
  2. A summary of characteristics that influenced the risk score, such as crowd size and location. These factors are discussed in more detail in the following section.

The user can adjust the activity characteristics and see the transmission risk score change in real-time. This helps develop an understanding of how the various conditions impact risk.

Scoring methodology and risk factors

To determine the transmission risk scoring system and activity characteristics, we first reviewed a number of articles in which panels of expert epidemiologists rated the risks of everyday activities, as well as the CDC’s general activity guidance.(1, 2, 3, 4, 5, 6, 7

Two of the articles, from Michigan(1) and Texas(2), provided numerical scores for approximately forty different activities. These scores served as our quantitative data for training and validating our scoring algorithm. The numerical analysis is discussed in detail in the following section.

In addition to the quantitative numbers, the experts provided a narrative discussing what made certain activities riskier than others. For example, on stadiums the Michigan panel stated, “…sports stadiums have crowding and alcohol. People are also likely to cheer, yell and sing, among other noises, which also makes the spread easier.”(1) On indoor restaurants, infectious disease expert Elizabeth Connick said, “I think the biggest risk is being in a closed space and breathing the same air that other people are breathing, and also not wearing masks.”(3),

In reviewing these articles, common themes emerged. The Japanese Ministry of Health based its public information campaign around avoiding the “Three C’s”(8): 

The Michigan panel cited similar factors, stating, “…whether it’s inside or outside; proximity to others; exposure time; likelihood of compliance; and personal risk level.”(1)

Dr. William Miller, an epidemiologist at Ohio State University, concurred, “We can think of transmission risk with a simple phrase: time, space, people, place.”(4) Another study from a researcher at the University of Denver compared the risk of indoor holiday gatherings based on the number of people, size of the room, and whether the gathering was inside or outside.(21)

Based on these consistent themes, we selected four characteristics for our activity rankings:

In addition to these four primary characteristics, two special situations were mentioned repeatedly in the articles: respiratory droplets and shared items.

The Michigan panel explained respiratory droplets as, “When people talk loud or sing, it potentially emits more of the virus into the environment, further increasing the risk level.”(1) This conclusion is reinforced by studies examining coronavirus outbreaks in a restaurant in Wuhan China(9), a call center in South Korea(10), a choir in Washington state(11), and a summer camp in Georgia(12). There is a growing body of evidence to suggest caution for these kinds of activities.(13, 14)

The other special situation involved sharing items. While many environments may contain shared touch surfaces like doorknobs, experts in several articles cited certain kinds of shared serving utensils or equipment as a particular risk. For example, the Texas Medical Association ranks eating at a restaurant as a “7,” but eating at a buffet as an “8.”(2) The CDC guidance advises people to consider whether they will need to, “share any items, equipment, or tools with other people.”(6) The Michigan panel emphasized the need to wipe down shared gym equipment before use.(1)

Our app considers activities that involve shared items, shouting, loud talking, singing, exertion, or aerosolizing medical/dental procedures as being at a higher risk than other activities with equivalent primary characteristics.

Taking all these factors into consideration, the app generates an activity-specific risk score from 1 to 9. This score is presented to the user as a descriptive rating according to the following table:

Risk Score Risk Title
9 Very High
7-8 High
5-6 Medium
3-4 Medium Low
1-2 Low

It should be noted that there was some disagreement among the expected scores between the Michigan and Texas panels, and even within the panels themselves. For example, the Michigan article stated, “There were varying opinions on the safety of flying in an airplane during a pandemic – two experts called it medium risk, one said it’s low risk and the other [said] it’s high risk.”(1) Michigan gave libraries a “3” while Texas gave them a “4.”(1, 2,)

Some of this disagreement can be attributed to different assumptions about the characteristics of the activity (different kinds of fights, for instance, or whether “flight” includes time spent in the airport or just the time spent on the aircraft), and in many cases it does not affect the final results (a “3” and a “4” are both categorized as a “Medium Low” risk in the app). Nonetheless, we stress that these are inherently subjective numbers where even experts in epidemiology have differing opinions.

Algorithm development

Starting with the forty activities rated by the Michigan(1) and Texas(2) panels, we assigned ratings in each of the four primary characteristics (location, duration, crowd size, and close contact), the two special situations (shared items and respiratory droplets), and the expected risk score (based on the expert ratings).

We then developed approximately twenty of our own activity ratings to add more granular scenarios or to fill in gaps. For example, the panel data listed “Air Travel” as a single category, but we split this into multiple scenarios for “short non-stop flight”, “multi-stop layover”, and “international flight.” Neither panel included yoga classes or book clubs, so we defined those activities ourselves.

A linear regression model was developed using the python statistics package statsmodels. Discrete values were assigned to categorical predictors, where lower numbers were less risky and higher numbers were more risky. The assignments used were:


Predictor Scoring
Close Contact 1 = Yes
No social distancing is performed; participants count as “close contacts” by the CDC definitions (<6 ft; >= 15 minutes)

0 = No
Social distancing is performed, or the nature of the activity (e.g., hiking or pumping gas) does not anticipate close contact with anyone outside the household.

Crowd Size 3 = Big Crowd (more than 25 people)
Examples: a crowded bar, big party, concert. <br> <br />

2 = Medium Crowd (11-25 people)
Examples: grocery shopping, backyard barbecue, team sport/activity.

1 = Small Group (5-10 people)
Examples: dinner with another family, shopping at a small business.

0 = Individuals (less than 5 people)
Examples: walk with a friend, tennis, curbside pickup.

Duration 2 = Long (More than 2 hours)
1 = Short (1-2 hours)
0 = Quick (Less than 1 hour)
Location 2 = Indoors (Small)
Examples: a home, restaurant, bar, or small business.

1 = Indoors (Large)
Examples: a museum, library, or arena.

0 = Outdoors
Examples: a park, stadium, or outdoor restaurant seating.

Respiratory Droplets 1 = Yes
0 = No
Shared Items 1 = Yes
0 = No


The data set (62 observations) was split 70%/30% into training and test sets. The training set was used to develop the algorithm. The test set was then used to validate the algorithm’s results against the expected scores.

In our validation, the mean error (absolute value of the difference between the algorithm’s score and the expert score) was 0.5. In 95% of the activities, the algorithm-assigned risk category (Low/Medium High/High/Very High) was the same as the category based on the expert’s rating. The sole outlier, “Big Backyard Party”, was expected to be “High” (risk score 7) but was categorized as “Medium” (risk score 6). Overall, the algorithm produced highly accurate results compared to the expert scores.

The final algorithm had an adjusted R-squared value of 0.910, indicating that the predictors account for the majority of the variance in the model. The P-values for all six predictors are highly significant:


Predictor P > |t|
Close Contact 0.000
Crowd Size 0.000
Duration 0.000
Location 0.000
Respiratory Droplets 0.000
Shared Items 0.003

We also explored the use of continuous predictors, such as a crowd size ranging from 1 to 100 people, and contact duration in hours. The continuous predictors did not perform as well as the discretely assigned predictors, and would often generate risk predictions that were greater than the maximum risk value of 9. Additionally, continuous predictors decreased usability of the app; it is easier for a user to select from a list of ranges than to guess specifically how many people will be in attendance.

A correlation analysis showed a relationship (Pearson correlation coefficient = 0.4) between close contact, crowd size, and location, and so we explored a new combined predictor called “crowd density” to encompass these factors. Crowd density did not perform as well as the individual predictors. 

Model reduction was explored, with the net result that the adjusted R-squared was lowered in all cases. The original model, using the six predictors, gave the best performance of all alternatives considered.

Further validation

The expert scores used to drive the algorithm were all subjective opinions, and no formal studies to date have explored a scoring system for activities such as the one used here. However, data from several sources helps us to validate these ratings.

A study of urban mobile phone data examined the effects of mobility on virus spread.(18) The researchers’ model predicted that “a small minority of ‘superspreader’ POIs (points of interest) account for a large majority of infections.” Some of the locations with the highest predicted impact on infections in their model were restaurants/cafes, fitness centers, and churches.

In Pennsylvania, a contact tracing report from the Allegheny County Health Department cited bars, restaurants, parties, gyms, weddings, and funerals as among the activities most responsible for coronavirus cases.(15) Louisiana’s contact tracing dashboard similarly highlighted bars, restaurants, assembly lines, and casinos as generating high numbers of cases.(16) Reports from the White House Coronavirus Task Force, cited by the Washington Post (19), point to house parties and other small-scale gatherings as a source of coronavirus clusters.

The activities identified as significant drivers of infection in each of these reports are also highlighted as “High” or “Very High” risk by our app. Although further study is warranted, this data lends credence to the expert analysis on which our algorithm is based.

Location-based incidence warning

Although community prevalence is excluded from the activity risk assessment numbers, the app does provide a separate, optional assessment of Covid prevalence in the activity location (specified by the user as a postal code). This assessment, provided by the website covidactnow.org, provides an alert level (from “Low” to “Critical”) based on a number of factors, including daily new cases and positive test rate.(21)


Our app provides advice about the transmission risk of everyday activities, with high correlation to the ratings given by expert epidemiologists. Although this data is currently highly subjective, it is our hope that ongoing studies and contact tracing metrics will provide additional data to refine the algorithm. Our goal is to provide a tool at the user’s fingertips to help them make good decisions about what activities they engage in, and potentially reduce the spread of COVID-19.


  1. DesOrmeau, Taylor. “From Hair Salons to Gyms, Experts Rank 36 Activities by Coronavirus Risk Level.” Mlive.com, 8 June 2020, www.mlive.com/public-interest/2020/06/from-hair-salons-to-gyms-experts-rank-36-activities-by-coronavirus-risk-level.html.

  2. Texas Medical Association. “Know Your Risk During COVID-19.” Texmed.org, 3 July 2020, www.texmed.org/uploadedFiles/Current/2016_Public_Health/Infectious_Diseases/309193%20Risk%20Assessment%20Chart%20V2_FINAL.pdf.

  3. Cimons, Marlene. “How Fauci, 5 Other Health Specialists Deal with Covid-19 Risks in Their Everyday Lives.” The Washington Post, 3 July 2020, www.washingtonpost.com/health/how-fauci-5-other-health-specialists-deal-with-covid-19-risks-in-their-everyday-lives/2020/07/02/d4665ed6-b6fb-11ea-a510-55bf26485c93_story.html

  4. Aubrey, Allison, et al. “From Camping To Dining Out: Here’s How Experts Rate The Risks Of 14 Summer Activities.” NPR, 23 May 2020, www.npr.org/sections/health-shots/2020/05/23/861325631/from-camping-to-dining-out-heres-how-experts-rate-the-risks-of-14-summer-activit.

  5. North, Amanda, and German Lopez. “How to Weigh the Risk of Going out in the Coronavirus Pandemic, in One Chart.” Vox, 22 May 2020, www.vox.com/2020/5/22/21266756/coronavirus-pandemic-covid-risks-social-distancing-chart.

  6. Centers for Disease Control and Prevention. “Deciding to Go Out.” Centers for Disease Control and Prevention, 30 July 2020, www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/deciding-to-go-out.html.

  7. Emanuel, Ezekiel. “COVID-19 Activity Risk Levels.” ezekielemanuel.com, 30 June 2020, www.ezekielemanuel.com/writing/all-articles/2020/06/30/covid-19-activity-risk-levels.

  8. Ministry for Health and Welfare (Japan). “Avoid the Three C’s.” Mhlw.go.jp, 2020, www.mhlw.go.jp/content/10900000/000615287.pdf.

  9. Centers for Disease Control and Prevention. “COVID-19 Outbreak Associated with Air Conditioning in Restaurant, Guangzhou, China, 2020 – Volume 26, Number 7-July 2020 – Emerging Infectious Diseases Journal – CDC.” Centers for Disease Control and Prevention, 7 July 2020, wwwnc.cdc.gov/eid/article/26/7/20-0764_article.

  10. Centers for Disease Control and Prevention. “Coronavirus Disease Outbreak in Call Center, South Korea – Volume 26, Number 8-August 2020 – Emerging Infectious Diseases Journal – CDC.” Centers for Disease Control and Prevention, 8 Aug. 2020, wwwnc.cdc.gov/eid/article/26/8/20-1274_article.

  11. Centers for Disease Control and Prevention. “High SARS-CoV-2 Attack Rate Following Exposure at a Choir Practice – Skagit County, Washington, March 2020.” Centers for Disease Control and Prevention, 14 May 2020, www.cdc.gov/mmwr/volumes/69/wr/mm6919e6.htm.

  12. Centers for Disease Control and Prevention. “SARS-CoV-2 Transmission and Infection Among Attendees of an Overnight Camp – Georgia, June 2020.” Centers for Disease Control and Prevention, 6 Aug. 2020, www.cdc.gov/mmwr/volumes/69/wr/mm6931e1.htm?s_cid=mm6931e1_w.

  13. Jayaweera, Mahesh, et al. “Transmission of COVID-19 Virus by Droplets and Aerosols: A Critical Review on the Unresolved Dichotomy.” Environmental Research, 13 June 2020, www.ncbi.nlm.nih.gov/pmc/articles/PMC7293495/.

  14. Speade, Mark, and James Weaver. “Performing Arts Aerosol Study.” National Federation of State High School Associations, 16 July 2020, www.nfhs.org/media/4029974/preliminary-testing-report-7-13-20.pdf.

  15. Allegheny County Health Department. “Common Characteristics Among New Cases.” Twitter, 6 Aug. 2020, twitter.com/HealthAllegheny/status/1291466180386062340.

  16. Louisiana Department of Health. “COVID-19 Outbreaks.” State of Louisiana, 14 Aug. 2020, ldh.la.gov/index.cfm/page/3997.

  17. Centers for Disease Control and Prevention. “How CDC Determines the Risk Level for COVID-19 Travel Health Notices.” Centers for Disease Control and Prevention, 25 August 2020, www.cdc.gov/coronavirus/2019-ncov/travelers/how-level-is-determined.html.

  18. Chang, S., Pierson, E., Koh, P.W. et al. Mobility network models of COVID-19 explain inequities and inform reopening. Nature (2020), doi.org/10.1038/s41586-020-2923-3.

  19. Brulliard, Karin. “At Dinner Parties and Game Nights, Casual American Life Is Fueling the Coronavirus Surge.” The Washington Post, 12 Nov. 2020, www.washingtonpost.com/health/2020/11/12/covid-social-gatherings.

  20. Chiu, Allyson. “Why Experts Urge Caution in Using Covid Risk and Tracking Tools.” The Washington Post, 13 Nov. 2020, www.washingtonpost.com/lifestyle/wellness/understanding-risk-covid-tracker-tools/2020/11/13/95adb654-2504-11eb-952e-0c475972cfc0_story.html.

  21. “America’s COVID Warning System.” Covid Act Now, 17 Nov. 2020, covidactnow.org/about.

  22. Huffman, J. Alex. “Thanksgiving Dinner during COVID: Overview of Aerosol Transmission Risk Modeling.” University of Denver via KDVR News, 18 Nov. 2020, kdvr.com/wp-content/uploads/sites/11/2020/11/Thanksgiving-modeling-overview-v1.pdf.