Difference between revisions of "Quantifying Vascular Calcification and Predicting Patient Outcome with Synthetic Data, Deep Neural Networks, and Logic Programming"

From Knoesis wiki
Jump to: navigation, search
(Created page with "Quantifying Vascular Calcification and Predicting Patient Outcome with Synthetic Data, Deep Neural Networks, and Logic Programming PIs: Forest Agostinelli, Susan Lessner, Hom...")
 
 
Line 21: Line 21:
 
Our multidisciplinary team consists of two computer science researchers specializing in artifi- cial intelligence and a vascular biology researcher. Furthermore, we are collaborating with Dr. Kathryn Fong (letter of support available upon request), who is a surgeon at Prisma Health in Columbia, South Carolina. PI Agostinelli has expertise in applying AI to other disciplines, in- cluding that of bioinformatics, and has already successfully combined synthetically generated data with DNNs in bioinformatics research [19, 20]. Co-PI Lessner is an expert in vascular bi- ology with extensive research on arterial structure [21–23]. Co-PI Valafar has expertise in bioinformatics and artificial intelligence. His prior research has investigated using computer vision techniques for tracking the arteries in the lower extremities [6]. Dr. Kathryn Fong is an expert on diagnosis and treatment of PAD and we will draw on her expertise to encode expert knowledge into our inductive logic programming approach.
 
Our multidisciplinary team consists of two computer science researchers specializing in artifi- cial intelligence and a vascular biology researcher. Furthermore, we are collaborating with Dr. Kathryn Fong (letter of support available upon request), who is a surgeon at Prisma Health in Columbia, South Carolina. PI Agostinelli has expertise in applying AI to other disciplines, in- cluding that of bioinformatics, and has already successfully combined synthetically generated data with DNNs in bioinformatics research [19, 20]. Co-PI Lessner is an expert in vascular bi- ology with extensive research on arterial structure [21–23]. Co-PI Valafar has expertise in bioinformatics and artificial intelligence. His prior research has investigated using computer vision techniques for tracking the arteries in the lower extremities [6]. Dr. Kathryn Fong is an expert on diagnosis and treatment of PAD and we will draw on her expertise to encode expert knowledge into our inductive logic programming approach.
  
<html><center><img src="https://raw.githubusercontent.com/AI-UofSC/aiisc/master/docs/images/diagram1.png?token=GHSAT0AAAAAAB5NZSFXW3A34XLP3NZRDZTUZAP6I7Q" </center></html>
+
<html><center><iframe src="https://docs.google.com/presentation/d/e/2PACX-1vQxjM1ab6K1DNsrFPuxU0BIvc3ll3zbyDYDm372mqz3FvmJzX_WkEGy2QTRUc79OgvVOGkr_opQCNhl/embed?start=false&loop=false&delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe></center></html>
  
 
         (a) Slice 1 (b) Slice 75 (c) Slice 136 (d) Slice 223
 
         (a) Slice 1 (b) Slice 75 (c) Slice 136 (d) Slice 223

Latest revision as of 18:41, 5 April 2023

Quantifying Vascular Calcification and Predicting Patient Outcome with Synthetic Data, Deep Neural Networks, and Logic Programming

PIs: Forest Agostinelli, Susan Lessner, Homayoun Valafar University of South Carolina (Uof SC)

1 Objectives Peripheral arterial disease (PAD) is a growing medical burden to the aging population in the United States, especially in states such as South Carolina that have high rates of diabetes and tobacco use [1–3]. While it is known that excessive calcium build up is a common cause of PAD and images obtained from computed tomographic angiography (CTA) can be used to image arteries in the peripheries, there is currently no way to quantify calcification in these arteries. This forces practitioners to determine treatment and prognosis in the absence of objective measurements. The barrier to obtaining objective measurements lies in the ability to track the aorta, the largest artery in the human body, as it travels from the heart to the feet, branching in to smaller and smaller arteries along the way. This tracking must be done across about 500 CTA slices. A visualization of this is shown in Figure 1. If one was able to track the then a score to quantify calcification can be computed in a similar manner to that of the Agatston score used for coronary artery calcification [4]. Furthermore, explainable artificial intelligence (AI) methods that draw on expert knowledge and inductive logic programming [5] could be used to learn a model that gives a prognosis based on the calcification in the arteries and the patient medical history. Preliminary methods based on computer vision techniques are only able to track arter- ies for a short distance, but eventually fail as they have a hard time recovering from small mistakes [6]. On the other hand, deep neural networks (DNNs) [7] that explicitly model the temporal relationship in data, such as long-short term memory [8] and transformers [9], have been very successful at segmenting sequences of images in both medical imaging and videos, in general [10–12]. However, DNNs require a large amount of labeled data for training. This amount can range from thousands to millions of examples. Unfortunately, there is not any hand-labeled data that completely labels the arteries from the heart to the feet and the process of hand-labeling thousands of scans would be too time consuming to be practical. To address this challenge, we propose using synthetically generated data to train our DNNs. This syntheti- cally generated data will mimic an arteries moving and splitting in a sequence of scans. Since exact replicas may be difficult to obtain, the synthetic arteries can be combined with arbitrary backgrounds, making the DNN focus on the task of tracking the arteries while being robust to irrelevant idiosyncrasies of the background. A visualization of the process for training a DNN to track arteries is shown in Figure 2a. Once the arteries are successfully tracked, then a score to quantify calcification can be computed in a similar manner to that of the Agatston score used for coronary artery calcifica- tion [4]. Furthermore, one can use the tracked arteries to build explainable AI (XAI) models for prognosis following a femoral endarterectomy (FEA) by predicting patient outcome based on which arteries are affected, the characterization of calcification in each artery, patient biomark- ers, and patient medical history. Since the amount of patient data is small, we currently have 25 patients participating in this study, we will have to use AI techniques that can learn from a few examples. Therefore, we turn to inductive logic programming (ILP) [5] that can be used for both XAI and for learning from few examples [13]. ILP techniques learn programs by com- bining logical predicates. Each predicate can be designed to be understandable by humans so that a program consisting of only these predicates is also understandable. Furthermore, be- cause ILP uses logic, expert knowledge can be encoded in the form of a logic program. ILP is then able to leverage this expert knowledge to learn concise and generalizable programs from small datasets. Such tasks have been successfully carried out with ILP in the fields of drug design and discovery [14, 15], protein folding [16], and automated scientific discovery [17]. A visualization of the process of training an XAI model for prognosis is shown in Figure 2b. Using ILP model to build a model for prognosis, we will address the current gap in knowl- edge related to understanding patient outcomes following FEA. Patients who have a success- ful technical outcome (i.e., blood flow to the lower limb is re-established) may ultimately ex- perience continued worsening of lower-limb circulation or may go on to require amputation. The disparity in patient outcomes is not well understood [18]. Furthermore, we will exam- ine the association of several novel protein markers, including follistatin-like 3(FLRG) and GDF-8, with patient outcomes (ankle-brachial index (ABI) at follow-up, reported symptoms, occurrence of major adverse limb events). Identification of protein biomarkers associated with adverse outcomes will improve non-invasive risk stratification and may also provide mecha- nistic insights that can lead to development of novel therapeutics.

1.1 Relation to ADAPT Research Thrusts Our proposal is related to the following ADAPT research thrusts: AI-Enabled Biomedical De- vices for Prognosis and/or Treatment, XAI-Enabled Biomedical Devices for Diagnostic Appli- cations.

1.1.1 AI-Enabled Biomedical Devices for Prognosis and/or Treatment Obtaining objective measurements for prognosis and treatment for PAD requires one to seg- ment the arteries in about 500 CTA slices. This tasks is both too laborious for humans and too computationally complex for standard computer vision techniques. Furthermore, there is not enough hand-labeled data to train machine learning models, such as DNNs, to success- fully complete this task (as is often the case with medical data). Our approach of synthetically generating data to train DNNs addresses the dearth of real-world data and opens up the pos- sibility of obtaining objective measurements for prognosis and treatment of PAD.

1.1.2 XAI-Enabled Biomedical Devices for Diagnostic Applications Once the CTA slices are segmented, the arteries can then be examined for calcification. Known methods for computing a calcification score [4] can be adapted to compute a calcification score for arteries in the lower extremities. To go even further, we seen to give a prognosis for the patients. However, the challenge of dearth of real-world presents itself, again. Furthermore, it is important that the decision made can be explained to the practitioner. To address this, we use inductive logic programming to leverage expert knowledge to create explainable decisions for prognosis.

1.2 Senior Personnel Our multidisciplinary team consists of two computer science researchers specializing in artifi- cial intelligence and a vascular biology researcher. Furthermore, we are collaborating with Dr. Kathryn Fong (letter of support available upon request), who is a surgeon at Prisma Health in Columbia, South Carolina. PI Agostinelli has expertise in applying AI to other disciplines, in- cluding that of bioinformatics, and has already successfully combined synthetically generated data with DNNs in bioinformatics research [19, 20]. Co-PI Lessner is an expert in vascular bi- ology with extensive research on arterial structure [21–23]. Co-PI Valafar has expertise in bioinformatics and artificial intelligence. His prior research has investigated using computer vision techniques for tracking the arteries in the lower extremities [6]. Dr. Kathryn Fong is an expert on diagnosis and treatment of PAD and we will draw on her expertise to encode expert knowledge into our inductive logic programming approach.

       (a) Slice 1 (b) Slice 75 (c) Slice 136 (d) Slice 223

Figure 1: Top: CTA images from the heart to the legs. Bottom: The most prominent arteries highlighted in blue with regions of potential calcification highlighted in red.


2 Prior Relevant Research 2.1 Background Peripheral arterial disease (PAD), which results from atherosclerotic plaque buildup with or without calcification in the large arteries of the extremities, constitutes a growing medical bur- den to the aging population in the United States. The estimated prevalence of PAD increases dramatically with age, from 0.9% in the population 40-49 years of age to 14.5% in people older than 69 [1]. In 2001, estimated cost to the US Medicare program for PAD-related treatment was greater than $4.3 billion [2]. This figure does not take into account lost wages and productiv- ity in PAD patients as a result of decreased mobility. Clinical presentation and outcomes of PAD are highly variable, ranging from asymptomatic disease to intermittent claudication (IC, limb pain during exercise) or rest pain in the affected limb. In the most severe cases, chronic limb-threatening ischemia (CLTI) may result in tissue loss or gangrene and the need for am- putation. Vascular calcification, which is commonly observed in PAD patients with co-morbid diabetes or chronic kidney disease (CKD), complicates interventional endovascular treatment and correlates with increased morbidity and mortality [24–26]. Currently, there is no vali- dated, widely accepted metric to quantify arterial calcification in the lower extremities [26] that would be comparable to the widely accepted Agatston score for coronary artery calcifica- tion [4]. Thus, there is an urgent need for better methods to identify patients at greatest risk for the most severe clinical outcomes, in order to treat these patients more aggressively. The ultimate goal of our work is to use machine learning (ML) for the analysis of medical images to non-invasively predict the clinical progression and outcomes of PAD in individual patients.

2.2 Significance Peripheral arterial disease (PAD), characterized by narrowing of the large arteries supplying the limbs, is a growing health issue for the rapidly aging population of the United States. PAD

  Real Patients

Calcification Score Prognosis

XAI

Expert knowledge Computer vision techniques Inductive logic programming

              Synthetically Generated Data

Deep LSTM Deep LSTM Deep LSTM

       (a)

(b) Figure 2: Left: While deep neural networks have been used to achieve excellent performance on object tracking tasks, they require a lot of data for training. Since hand-labeled real-world data is almost non-existent, we will synthetically generate data of aorta-like objects moving and branching into smaller pieces to mimic real-world behavior. Right: After we can success- fully track the aorta, we will then compute a score for calcification. Furthermore, we will build an explainable AI model that is capable of learning rules for prognosis from limited data that leverages expert knowledge, computer vision techniques, and inductive logic programming. results from atherosclerotic plaque formation, often accompanied by calcification or bone-like mineral deposition, in the peripheral arteries such as the iliacs and femorals. Besides ad- vanced age, risk factors for developing PAD include tobacco use, diabetes, hypertension, and elevated cholesterol levels, all factors that are highly prevalent in the state of South Carolina. PAD severely impairs quality of life as patients increasingly avoid the pain associated with walking as the disease progresses. Advanced PAD can progress to chronic limb-threatening ischemia (CLTI), characterized by pain in the affected limb at rest, ulceration, and ultimately gangrene, frequently requiring amputation. However, currently it is difficult to predict out- comes for individual PAD patients. The incidence of lower extremity amputation for PAD in the South Atlantic Region, including South Carolina, is significantly higher than the national average [3], as shown in Figure 3. CLTI is associated with extremely high levels of morbidity and mortality, up to 40-50% within one year of initial diagnosis and treatment [27–29]. Vascu- lar calcification is a serious complication in PAD patients, particularly prevalent in those with diabetes or chronic kidney disease, that increases the risk of adverse outcomes and hinders treatment by endovascular interventions. A recent observational study in the UK has demon- strated that vascular calcification in symptomatic PAD patients is predictive of future cardiac morbidity and mortality (major adverse cardiovascular events, MACE) as well as all-cause mortality [25]. A reliable, validated metric for calcification of the lower extremities, compa- rable to the widely accepted Agatston score for coronary artery calcification, has not yet been developed and is critically needed. Furthermore, there is a significant unmet clinical need for prognostic tools to risk stratify PAD patients undergoing FEA according to expected out- comes, due to individual variability in patient disease progression and response to treatment.


  Fig. 1. Rates of lower limb amputation for PAD across the US for Medicare patients, 2000-2008. Adapted from 9.

Figure 3: Rates of lower limb amputation for PAD across the US for Medicare patients, 2000- 2008. Adapted from [28].

Figure 4: Results of an LSTM trained to track the aorta. 3 General Research Plan 3.1 Artery Tracking using Synthetically Generated Data and Deep Neural Net- works The task of tracking arteries can be easily done using the human vision by simply telling a person what to track and that the artery will split into smaller arteries. This can be done without any prior medical experience. However, this task is very difficult for a DNN to learn how to do from a limited number of examples. Therefore, we seek to give the DNN a similar inductive bias as that of a human by generating artery-like objects (circle and ovals), that move over time and split into smaller objects. We can then combine this with arbitrary backgrounds. These backgrounds could be randomly generated or modified from other medical images. This will ensure that the DNN learns to segment the aorta in a background-agnostic method. This will make the DNN robust to individual differences in CT-scans and can ensure it will only focus on the task of tracking the artery. It is important that a DNN model be able to understand the temporal relationship between CTA slices. A simple DNN does not do this as it would only take a single slice as an input and output a single segmentation. However, long short-term memory models (LSTMs) [8], explicitly model time series data by having a DNN also output the context at each timestep and take that context as input in the next timestep. A visualization of this process is shown in Figure 2a. LSTMs have been used to successfully segment medical volumetric images and other videos [10, 11]. We have preliminary results on small labeled datasets that show that the LSTM is able to successfully track the aorta, as shown in Figure 4. In the future, we also plan to apply transformers [9] to this task due to their success at modeling time series data, including that of object tracking [12]. After successfully tracking the arteries, calcification all the arteries can be quantified be aggregating a score similar to that of the Agatson score [4]. This will give practitioners the first objective quantification of calcification of the entire lower extremities. Providing this objective measurement will have a significant positive impact on practitioners’ ability to diagnose, give a prognosis, and treat PAD.

Figure 5: Schematic study timeline showing patient participation and sample analysis.

Figure 6: Selected plasma protein levels in PAD patients vs. normal control by antibody array.

3.2 Inductive Logic Programming for Explainable Prognosis Inductive logic programming [5] has the appeal of being inherently explainable as a logic program is made up of logical predicates that have symbolic meaning. This is opposed to the sub-symbolic approach of DNNs. For example, a logical predicate can be true if there is calcifi- cation in a certain region of an artery or if the calcification is of a certain severity. Furthermore, by including patient medical history and biomarkers, logical predicates can capture symbolic information about to a patient’s related illnesses, age, or protein markers, such as FLRG and GDF-8. Such logical predicates can then be composed to create a program that gives a prog- nosis. In particular, we are interested in patient outcome following a femoral endarterectomy (FEA). The patient biomarker data could play a significant role in prognosis, therefore, we will describe in detail this data and how it is obtained. An overview of patient participation is shown in Figure 5. The logic programming software we will use is Popper [30]. Popper learns logic programs by learning from failures. Each failure allows it to prune other programs that are guaranteed to not be useable. Furthermore, Popper allows one to specify expert knowl- edge in very robust ways. This can be done by defining logical predicates or by defining constraints on the type of programs to be learned. Popper’s ability to handle highly expres- sive constraints can lead to more succinct logic programs that learn from very small amounts of data.

3.2.1 Patient Biomarker Data We recruited 25 PAD patients into a pilot study and measured plasma biomarker levels in heparinized blood samples drawn at the time of FEA surgery. We also obtained patient lower- body CTAs to develop and test methods to automate calcification analysis. To date, we have sent out three patient plasma samples for unbiased protein biomarker screening using a 1000- human protein antibody array platform (Raybiotech). These samples were compared to a pooled control plasma derived from 10 normal, male patients. There were 41 proteins ele- vated at least 3-fold in PAD patient samples relative to the control, and 384 proteins reduced by at least 3-fold in PAD patients (see examples in 6). Of these, we have chosen to focus on known inhibitors of calcification (e.g., fetuin A [31, 32]) and proteins involved in either osteo- genesis (e.g., osteoactivin [33–35], FLRG [36, 37]) or skeletal muscle growth and maintenance

(osteocalcin [38, 39], myostatin(GDF-8) [40, 41]). Of particular interest, GDF-8, an inhibitor of skeletal muscle growth, was found to be upregulated in plasma of PAD patients, while one of its known inactivators, FLRG16, was downregulated. These findings suggest a potential role for GDF-8 in the skeletal muscle atrophy typically seen in PAD [42].

We will assume that the standard deviation in measured plasma biomarker levels will be similar to that observed for GDF-8 in our pilot study (28% of patient mean). We also assume that the difference in levels of each biomarker between patients who have adverse outcomes and those who remain stable after FEA will be at least 25%, or one-fourth the minimum ob- served difference between PAD patients and normal control plasma for all selected biomark- ers. Then if the true difference in the experimental and control means is 0.25 (25% effect size), we will need to study 15 experimental subjects (defined here as those who have adverse out- comes within 1 year after FEA) and 30 control subjects (defined as those who remain stable after FEA) to be able to reject the null hypothesis that the population means of the experimen- tal and control groups are equal with probability (power) 0.8. The Type I error probability associated with the test of this null hypothesis is 0.05. Thus, we need to recruit at least 45 patients for this study to show a statistically significant correlation between biomarker levels and patient outcomes if the effect size is 25%. If the ratio of patients who remain stable to those who worsen is increased to 3:1, then a total of 56 patients (42 stable and 14 with progres- sive disease) will be needed to detect the same effect size, but this number is still attainable given current FEA caseloads at Prisma Health. The current caseload is approximately 120 pa- tients/year with approximately 50% of PAD patients consenting to participate in PAD studies when asked. This will give us more than enough patients for our analysis.

3.2.2 Monitoring Patient Outcomes FEA patients are routinely followed post-surgery at Prisma Health for both short-term com- plications and long-term clinical outcomes. We will also obtain baseline demographic and clinical data from electronic medical records (EMR) at the time of patient enrollment, to as- sess potential confounding factors such as smoking status, diabetes, and current medications. Peri-operative (within 30 days) adverse clinical outcomes include thrombosis of the treated artery. Longer term adverse outcomes can include restenosis or disease progression as indi- cated by decreased ankle-brachial index (ABI), decreased pain-free walking distance, need for revascularization in a downstream vessel, or development of CLTI. For this pilot study, we will include PAD patients requiring FEA for a diagnosis of either non-responsive intermittent claudication (i.e., disease that has not responded to lifestyle interventions and/or medical treatment [43, 44]) or CLTI. Disease progression in CLTI patients is on average more rapid than in patients presenting with intermittent claudication. In CLTI, roughly 40-50% of pa- tients progress to limb amputation or death within one year of initial diagnosis and treat- ment [27, 29]. It has also been observed that up to 60% of patients treated for CLTI by periph- eral revascularization in one limb will require treatment of the contralateral limb within one year. The rapid disease progression in CLTI patients will improve our ability to detect signifi- cant differences in calcification biomarkers between patients who remain stable and those who require further intervention. Patient follow-up after surgery will follow clinical standard of care, with patient assessment at 30 days post-operatively and then at 3-month intervals. Stan- dard patient follow-up includes focused history, physical exam, and measurement of ABI. Adverse clinical outcomes following FEA will be defined to include decrease of at least 20% in ABI, measured decrease of at least 20% in pain-free walking distance, and major adverse limb events (MALE), such as need for subsequent lower limb revascularization or tissue loss in the affected limb requiring amputation.