S1E23 Good and Bad Data (ft. Anatoly Postilnik, First Line Software)

Anatoly Postilnik, Head of Healthcare Practice at First Line Software, explores what qualifies as good or bad data.

Transcript:

0:0:0.0 –> 0:0:7.80
Jordan Cooper
With Anatoly, postal neck, head of health care practice at first line software, I had a totally thank you for joining us today. How are you doing?

0:0:7.750 –> 0:0:10.260
Anatoly (Guest)
Good afternoon. Jordan, I’m doing great.

0:0:11.120 –> 0:0:30.950
Jordan Cooper
So today we’ll be discussing uh good data and bad data in particular will be discussing clinical data and data governance. Again with Anatoly. Anatoly, I’d like to ask you if you elaborate upon what data quality means to you and specifically can you address how to differentiate good data from bad?

0:0:33.180 –> 0:0:33.810
Anatoly (Guest)
During.

0:0:34.820 –> 0:0:54.950
Anatoly (Guest)
This is a fascinating topic that comes up all the time in different scenarios and the one of the major questions that usually comes up is you know what is good data, what is bad data, and how do we define what is.

0:0:55.650 –> 0:1:19.200
Anatoly (Guest)
Good day. Then how we deal with situations with when data is not good and I think the what we need to start first is basically give a definition how would we define a good data and from my perspective and from experience, I’d say that good data is the one that serves the purpose. That’s a very kind of simple, straightforward definition.

0:1:20.760 –> 0:1:49.180
Anatoly (Guest)
Because the data is created by people, the data is created by systems and it is designed to be used for a particular purpose in the majority of cases, the data is as good as the data is good as long as it serves the purpose it was created for. So when, for example, we look at the data that is stored in a in the HR system and tried to extract this data from HR system.

0:1:49.390 –> 0:2:3.390
Anatoly (Guest)
We’ll find lots of lots of things that we don’t like in reality. This is only because we’re trying to use this data for the purpose that deviates from the purpose this data was originally created for.

0:2:4.290 –> 0:2:36.600
Anatoly (Guest)
So this is 1 fundamental misconception. Data is really good for specific system that was designed for and as long as it works for that system as long as supports the functionality and the purpose of that system, it is good. So let’s talk about different scenarios where data come across as questionable and not sufficiently good. Typically one of the major scenarios, one of the major reasons the data.

0:2:36.830 –> 0:2:38.920
Anatoly (Guest)
Comes across as a.

0:2:39.720 –> 0:2:49.570
Anatoly (Guest)
Uh put on for poor quality is when we extract the data. When we do something with the data in reality is we transform the data every time and we don’t think about it.

0:2:50.290 –> 0:3:21.240
Anatoly (Guest)
For example, when we look at the interfaces where they it is HL 7 interface or it is fire interface. When we when we expose the data from HR system from any other system through interfaces that is not the data that is in the HR system. That is kind of massaged data, converted data, transform data for the purpose of the interface. Obviously you may store data in relation table relational format or non relational format.

0:3:21.360 –> 0:3:42.530
Anatoly (Guest)
In a particular database, when you extract it with using fire, it’s Jason and it is disconnected from a particular system system. There is no identifiers, there is no referential integrity, things like that. But at the same time this data extracted from the system, we perfectly serve the purpose that it is used for.

0:3:43.350 –> 0:3:49.600
Jordan Cooper
So I think I hear what you’re saying is that data is good if it serves a specific purpose.

0:3:51.100 –> 0:3:54.600
Jordan Cooper
Yeah, that it was designed for. And then.

0:3:54.740 –> 0:4:2.330
Jordan Cooper
Uh data quality loss is often due to transformation of data, and that leads to bad data. Is that correct?

0:4:4.170 –> 0:4:4.710
Jordan Cooper
So.

0:4:2.800 –> 0:4:10.240
Anatoly (Guest)
That’s correct. That’s correct. But there are other reasons why the data might not be may be questionable and may not be perfect, but please get.

0:4:10.950 –> 0:4:19.110
Jordan Cooper
So I think many of our listeners who again are often the CIO or their counterparts of large healthcare delivery systems and the United States.

0:4:20.350 –> 0:4:51.180
Jordan Cooper
Would be interested in hearing what you have to say about trend data trends you’ve seen across the industry now as the head of healthcare practice at first line software, you have many different customers across the United States and you’ve been able to see at these customer sites how their data practice usage is put into practice. I think our listeners would love to listen to some anecdotes about what you’ve seen, what trends you’ve seen when data standards have been put into practice and what has led to good quality data.

0:4:51.270 –> 0:5:0.310
Jordan Cooper
In which case you be replicated by our listeners and what mistakes have you seen out of the bad quality data that would like that would be best avoided by our listeners?

0:5:1.170 –> 0:5:30.760
Anatoly (Guest)
Sure, absolutely. So let’s, let’s take take take a look at a few examples. Let’s imagine we extract data for to an external system for research purposes using some ETL process to pull the data for into a common data model to for example search for eligible patients for clinical trials or studies. The common data models typically used for these purposes have pretty well defined data models.

0:5:30.890 –> 0:6:2.140
Anatoly (Guest)
Pretty well defined rules that that say what the good quality what what is the good quality data and those examples may include for example attachment of standard terminologists, standard concept IDs and when we pull this data from from HR systems we we we don’t have that because very often we don’t have that because the data stored the nature system might not have those identified, they don’t they’re not need it for the function of that system. So what we need to do is to actually do.

0:6:2.420 –> 0:6:26.840
Anatoly (Guest)
Uh, concept mapping, data normalization, data transformation, data analysis and quality analysis in the context of that particular system. But that’s not necessarily the case. There is some cases where the data inherently has variability. For example, when you pull data from when you get a lab result by nature of the processes associated with testing.

0:6:27.880 –> 0:6:59.270
Anatoly (Guest)
Even if you send the same sample to two different labs, you may have some variability in the results. That is because the labs using the libs are using different equipment, they they are using this equipment with different level of precision and so on and so on. There will be some variability in in those results and it is common and it has been up to recently up until COVID that all of the results have to be reviewed by the provider and that’s one of the reasons because the results are not perfect, they are there is variability in those results.

0:6:59.400 –> 0:7:29.300
Anatoly (Guest)
And sometimes two different labs will send you data that looks like this. It may, for example, the result may be incomplete or partial from the user perspective, when the clinician looks like that, it’s pretty much the same thing, but the system does not understand that this is the same thing. So what human says and what the system says is a very different thing, and the stand and those two entities understand data differently. When we send data as a, let’s say, a fast, a human can.

0:7:29.400 –> 0:8:1.90
Anatoly (Guest)
Completely and happily, not necessarily happily, but can look at this data and and look at their record and say, well, so clear. What’s going on with your patience, but try to fit this data into a system. It’s completely useless in many scenarios. The data just not structured, unstructured data is not very useful for this systems, but it is completely, maybe completely consumable by humans. Moreover, there are technologies today that can extract meaningful insights from this data. But again, this depends on the technology that is used in this particular case.

0:8:2.260 –> 0:8:21.770
Jordan Cooper
So you also mentioned in previous conversations that another reason for poor data quality is the source system itself. It’s organic growth and evolution in the process and it’s configured by which it is configured and maintained. Would you mind elaborating upon that within the context of an electronic health record implementation project?

0:8:22.650 –> 0:8:31.610
Anatoly (Guest)
That’s a great point, Jordan. There’s several reasons why the data that exists in HR system by itself is not perfect.

0:8:32.170 –> 0:8:53.450
Anatoly (Guest)
A large EHR systems are configured in built by by consultants by groups of people who are not necessarily talk to each other, so consultants may be configuring and the HR system for different departments, and often inadvertently creating data elements creating.

0:8:53.610 –> 0:9:18.190
Anatoly (Guest)
A elements that are duplicate conflicting. One of the examples is that we were worked with one healthcare institution. We found seven metrics, lengths of state metrics. Think of operational analytics that relies on lengths of state which is very important to metrics. You will get all kind of results depending what kind of metric you use in this particular state.

0:9:19.310 –> 0:9:49.420
Anatoly (Guest)
The other example is we found in another instance, we found a data warehouse that had a very substantial variability in the data because the data that comes to that data warehouse comes from multiple interfaces, and those interfaces are not necessarily reconciled, and designers of the data warehouse have not put in place proper data. Governments at the time when the data being committed to the data warehouse, for example, you might get ethnicity coming from.

0:9:49.490 –> 0:10:19.130
Anatoly (Guest)
She’s on the interfaces expressed in different terms. It may be HC and may be Hispanic, or maybe something else and maybe coded value coming from different interfaces. And unless somebody reconcile this data, this will not be this will not be a good quality data and why nobody cancels this data could be a different different topic as well because when you send the an order to a lab and lab comes back with the.

0:10:19.650 –> 0:10:39.690
Anatoly (Guest)
What would the result for the source system? Uh ethnicity doesn’t make much of a difference because they already have patient ID and all of the demographics data is already stored in the system. Yet the HL 7 message contains demographics information. What this system is gonna do with the demographics information? Who knows? It depends on this particular system.

0:10:41.160 –> 0:11:10.130
Jordan Cooper
You, uh, so moving on to the third reason for why you might have poor data quality. You have previously mentioned that the you attribute poor data quality to the aggregation of data from multiple systems without harmonization. Reconciliation you’ve referred to that previously as normalization, deduplication. You just gave one example of having multiple different systems. Would you be able to elaborate on a particular concrete example that you’ve worked on in the past?

0:11:10.210 –> 0:11:14.310
Jordan Cooper
And what you did to resolve that poor data quality issue with your customers?

0:11:17.410 –> 0:11:31.940
Anatoly (Guest)
The process of aggregating data from multiple multiple sources is a multi. It’s basically a multi step process but there is a source data. The source data goes through, usually some kind of.

0:11:32.290 –> 0:11:52.590
Anatoly (Guest)
A transformation process, typically done through maybe an integration engine like Health connect, inter systems, Health CONNECT and not a lot of people realize that integration engines is not just a tool to transform the data, it’s also workflow engine. You can analyze data as it being.

0:11:53.150 –> 0:12:6.560
Anatoly (Guest)
Being a converted, transformed and sent to the destination system, you can do all kinds of interesting things with this process. You can analyze quality of the data, you can generate alerts, you can.

0:12:6.640 –> 0:12:36.430
Anatoly (Guest)
Uh, you can currently in the data for further manual analysis of this data and reconciliation and resending the data to the destination system. There lots of lots of different things that you can do at the time when the data is being committed to the destination system. It’s a lot easier to do it than later on. Try to analyze this in the in the destination system, then to earlier than analyzing the destination system.

0:12:36.750 –> 0:12:47.230
Anatoly (Guest)
But do that while the data is being committed to the to the destination system. One of the examples is we’re working with one institution where they have lots of duplicate records.

0:13:13.10 –> 0:13:43.380
Jordan Cooper
So I’d like to talk about more broad trends. I’ve spoken about general specific use cases of where data quality has been poor or has been of a better quality. What have you seen generally, what sort of data practices have you seen that are widespread across many customers in the United States that they’re all doing something to get good data? Or what’s a common error that you see many institutions making?

0:13:43.520 –> 0:13:50.220
Jordan Cooper
What sort of trends in good in uh in in data quality practices have you been seeing across many institutions in the US?

0:13:51.770 –> 0:14:5.260
Anatoly (Guest)
Well, obviously, and the most important thing that organization can do with respect to data quality is actually install it kind of setup data governance processes.

0:14:6.520 –> 0:14:8.790
Anatoly (Guest)
And define.

0:14:10.150 –> 0:14:26.180
Anatoly (Guest)
Define define methodologies and best practices to keep data of of good quality and perform periodic sanity checks. Periodic periodic revision of the existing data and existing scenarios where the data may be of poor quality.

0:14:27.300 –> 0:14:55.430
Anatoly (Guest)
At 1.11 kind of expression of this poor data quality and opportunities to to be proactive in improvement of data quality is quality is reporting a lot of institutions, a lot of most in fact, healthcare organizations generate operational reports used for multiple different purposes. What we found out is organizations who do not impose.

0:14:56.720 –> 0:15:9.400
Anatoly (Guest)
Rigid and meaningful data quality data quality practices. They end up with reports that duplicate that conflicting and not a lot of people know about that.

0:15:10.460 –> 0:15:37.600
Anatoly (Guest)
It is important to create for example, good quality reporting catalogs where you can classify different types of catalogs which are basically analytical assets. Very important assets for their organization and setting up. And we’ve helped organization to set up data governments over their reporting, reporting assets and kind of organizing reports in a way that they know what this reports are about. It’s not just.

0:15:38.0 –> 0:15:42.520
Anatoly (Guest)
Uh, it’s not just the report itself, it’s the metadata around those reports that they import.

0:15:43.710 –> 0:15:56.240
Jordan Cooper
So when one of our listeners, perhaps the CIO of a large health system in the US, finds that they do have data quality issues, right? And it’s widespread across the organization.

0:15:56.780 –> 0:16:17.570
Jordan Cooper
Uh, what sort of step should they take to remediate those issues that might lead them, for example, to speak to first line software? Or what sort of issue, what sort of steps have you seen those sorts of customers in the past take to resolve their data quality issues?

0:16:18.330 –> 0:16:35.600
Anatoly (Guest)
Well, the the first step of course is to understand where, where, what is the impact where the so when when people talk about well, we discovered the data is of poor quality. They obviously discovered that in a particular context. So one of the.

0:16:36.220 –> 0:17:5.790
Anatoly (Guest)
One of the most important steps is to define the criteria. What is good, what is, what is important, what is not good as I, as I mentioned, the data can happily live in a particular system and as long as the system functions correctly, that’s not the major problem. That’s not a problem at all. So what is important is to understand in which cases the data impacts operations, the quality of data impact apparations. So as I mentioned, we.

0:17:5.860 –> 0:17:28.550
Anatoly (Guest)
Found in in so for reporting you found a lot of duplicate reports. We found a lot of inability for their organization to recognize the data for quality, so setting up practices and saying what context did data would need to be of good quality generate analysis. Do periodic reviews of certain elements of the data would be very well.

0:17:29.970 –> 0:17:33.510
Jordan Cooper
So as we approach the end of this podcast episode.

0:17:34.950 –> 0:17:50.860
Jordan Cooper
I’d like to ask you kind of a playful question. If you were to rub a magic lamp and have a genie come out of it, who’s willing to give you a wish to solve any problem confronting a large healthcare organization, what would you wish for?

0:17:52.110 –> 0:17:56.420
Anatoly (Guest)
But yeah, that’s that’s an existential question.

0:17:58.420 –> 0:18:18.670
Anatoly (Guest)
Well, I I think I think your question is very loaded because there is one topic is of data quality. The other topic is of course of the system that supports operations of the particular institution. And to me, what is in today’s world? What is?

0:18:19.770 –> 0:18:22.780
Anatoly (Guest)
Impacting the most is kind of.

0:18:24.100 –> 0:18:54.390
Anatoly (Guest)
Closed environment. The fact that our EHR systems are kind of more from analytic closed environments that not like our mobile phones, the beauty of our mobile phones is that you pick best operate applications, apps, download them and they all work together very well. And the glue between those apps is the operating system and the data models that connect those.

0:18:54.550 –> 0:19:14.20
Anatoly (Guest)
Perhaps together so in ideal world, those systems should become open and the data models that they operate on become more standard, more exposed, more governed, more controlled. In this case, the apps will work much better, and the we all will benefit from that.

0:19:15.610 –> 0:19:27.540
Jordan Cooper
Well, I think we’ve covered a lot of topics today. We spoken about good data and bad. I think we’ve learned that data is contextual and that can be good in one context, bad in another we’ve learned.

0:19:27.620 –> 0:20:0.790
Jordan Cooper
Umm, that data good data is defined as as that where it it addresses the functionality of the system was designed for. There are three main causes of poor data quality. We covered. One was loss of data quality loss due to transformation of the data. The other was poor quality data because the source system itself was not configured or maintained properly and the third one was poor quality data due to aggregation from multiple systems without harmonization reconciliation.

0:20:1.130 –> 0:20:7.880
Jordan Cooper
And then finally, we talked about how basically when you have the correct.

0:20:7.970 –> 0:20:37.160
Jordan Cooper
Uh, connecting tissue between different sorts of organizations that right interface engine, for example, that may be the best way to preserve and maintain good quality data, since again, as you spoken at one example, good quality data in an XHR may be poor quality data in some other use case, and so translating it and making sure that you have a deduplicated normalized set of data is important to make sure you maintain good quality data. So.

0:20:37.950 –> 0:20:45.800
Jordan Cooper
Anatoly, this has been Anatoly Postilnik, the head of healthcare practice at first line software had a totally like to thank you very much for joining us today.

0:20:46.440 –> 0:20:47.630
Anatoly (Guest)
Thank you, Jerry. There was a pleasure.