Using machine learning to detect disease before symptoms manifest

Prof. Alfred Hero speaks to ECE about his work using data to predict the transmission of infectious disease among people who are pre-symptomatic or asymptomatic and how it relates to COVID-19.
Illustration of virus epidemic transmission in a crowd Enlarge

In this Q&A, Prof. Alfred Hero explains his work using data to predict the transmission of infectious disease among people who are pre-symptomatic or asymptomatic and how it relates to COVID-19.

ECE: How does your research relate to COVID-19?

AH: We have been working on predicting health and disease of people exposed to infectious viral pathogens since early 2007. What has been particular interest to us are the asymptotic and pre-symptomatic spreaders, i.e., persons who are infected but don’t feel ill and spread the virus (index cases often called the “Typhoid Mary’s”). This is especially relevant to COVID-19 for which symptoms are often very mild and the incubation period is relatively long.

ECE: How did your research evolve into what it is now?

AH: Our early work applied machine learning methods to data collected from a human population to develop genomic, metabolomic and proteomic predictors of asymptomatic and presymptomatic infectious disease using blood assays and assays of other bodily fluids. One of the outcomes of this work was the discovery of a small panel of genes in whole blood that can be used to detect early signs of acute respiratory viral infection (ARVI), for which we were awarded a patent [1].

We established in [2] that this panel yields a characteristic signature that clearly differentiates between infected individuals who become sick and those who don’t, after viral exposure.

In another paper [3] we used sparse machine learning to establish the benefit of pairing a patient’s test sample with a healthy reference (baseline) sample for classifying the patient’s state of infection. When applied to classifying a sample coming from an asymptomatic infected subject, pre-symptomatic infected subject or symptomatic infected subject, the reference-based classifier [2] did significantly better than the standard classifier having no access to the baseline sample. Thus the availability of a personalized baseline reference improves our ability to detect these disease states.

This early work was in collaboration with other engineering researchers and medical clinicians and was supported  from 2007 to 2011 by a DARPA/DSO grant under the Predicting Health and Disease (PHD) program. Later, we were supported by another grant from DARPA/DSO  Biochronicity program that expanded the scope of PHD to include steroidal, physiological and cognitive assays collected over a longer and more densely sampled timeline.

Among other findings, our analysis of the Biochronicity data has revealed that a person’s circadian chronotypes are strongly associated with severity of infection [Manuscript under preparation].

Most recently, supported by a grant from the DARPA/BTO grant Prometheus program, we have focused our research directly on early detection that a subject is shedding virus leading to transmission and spreading of infection. Prometheus data was collected from wearable devices (smart watches) about heart rate, skin temperature and other physiological signals. We determined a set of features extracted from these signals can be used to predict that a person will be shedding virus 24 hours before shedding occurs [4].

ECE: How could this research potentially be used to address what’s going on with the current, or a future, pandemic?

AH: Our research is relevant in several ways:

  1. If larger scale blood testing of the entire population emerges and rapid and low cost genomic assay technology becomes available, our classifier in [1] and [2] could be applied to a test sample to predict if a person will become an asymptomatic spreader. If these become available, the information gained could help reduce the spread of the infection.

  2. As we get more data from the populations wearing smart watches through this pandemic, our predictive models will be refined. These refined models would enable non-invasively (without tests) predicting asymptomatic spreading.

ECE: What are the biggest challenges with this work?

AH: Our available data has the following deficiencies in terms of applicability to COVID-19:

  1. It is small sample size (less than 300 study participants) so, until we get larger data samples, we do not know if our results will generalize to the general population.

  2. So far we have not observed a detectable pre-symptomatic signal in other bodily fluids, such as proteomics in breath condensate or nasal mucosa. Thus our methods are not effective for the current nasal swab test for COVID-19. When combined with blood or wearable data there may be a detectable signal.

ECE: What are the next steps?

AH: We are working on folding the social contact network into the subject-specific assay data. This combination of contact data and assay data can reduce the sampling requirements for detecting  presymptomatic spreaders. However, there are caveats. One cannot track all contacts. Much exposure occurs through fomites left on surfaces like doorknobs, switches and elevator buttons that cannot be tracked effectively.

ECE: Have the new social distancing protocols affected your ability to do this research? If so, how did you adapt?

AH: My research group at U-M works on analyzing data that has already been collected. Thus social distancing does not affect us. However, I anticipate that it will affect the availability of COVID-19 host response data that would allow us to refine our models for this kind of virus.

References

[1] G. Ginsberg, J. Lucas, C. Woods, L. Carin, A. Zaas, and A. Hero, “Methods of identifying infectious disease and assays for identifying infectious disease,” US Patent 8,821,876. Filed May 22 2010. Issued Sept 2 2014.

[2] Y. Huang, A.K. Zaas, A. Rao, N. Dobigeon, P. Woolf, T. Veldman, N.C. Oien, M.T. McClain, J. Varkey, B. Nicholson, L. Carin, S. Kingsmore, C.W. Woods, G.S. Ginsburg, A.O. Hero, “Temporal Dynamics of Host Molecular Responses Differentiate Symptomatic and Asymptomatic Influenza A Infection,” PLoS Genetics, 2011.

[3] T.-Z. Liu, T. Burke; L.P. Park; C.W. Woods; A.K. Zaas; G.S. Ginsburg; and A.O. Hero, “An individualized predictor of health and disease using paired reference and target samples.” BMC Bioinformatics, vol. 17, no 1, 15 pages, 2016.

[4] X. She, Y. Zhai, R.  Henao, CW. Woods, GS. Ginsburg, P. Song, AO. Hero, “An unsupervised transfer learning algorithm for sleep monitoring,” arXiv:1904.03720.

About the researcher

Alfred O. Hero III Enlarge
Alfred O. Hero III

Alfred O. Hero is the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science and R. Jamison and Betty Williams Professor of Engineering. He has been a leader in the development of the theoretical foundations of signal processing for decades. His research has been applied to network data analysis, personalized health, multi-modality information fusion, data-driven physical simulation, materials science, dynamic social media, and database indexing and retrieval, among other areas.

Hero is Chair, National Academies of Science, Engineering and Medicine (NASEM) Committee on Applied and Theoretical Statistics. He has chaired several international conferences, and served as President of the IEEE Signal Processing Society and Director of IEEE Division IX.

He has received the 2020 IEEE Fourier Award; the IEEE Signal Processing Society Society Award, Technical Achievement Award, and Meritorious Service Award; and numerous best paper awards. He co-authored the textbook, Foundations and Applications of Sensor Management and co-edited Big Data Over Networks. He has published more than 600 journal and conference papers and has 4 patents.


Artist rendering of COVID-19 Enlarge

The research featured in this story is related to some of the issues facing individuals around the world due to COVID-19. See our COVID-19 page for additional resources and news from Electrical & Computer Engineering.