Federated data elements and the PHQ9

PHQ9 (Patient Health Questionnaire) provides a few use-cases that we would like to explore with respect to Federated Data Elements. A federated data element is a variable representing a response value that can be compared across datasets.

The three forms of “PHQ-9”.

  1. [orig] Some people will use the standard description: “Over the last two weeks …”.
  2. [deriv1] Some people use an alternate description: “Over the past week …”
  3. [deriv2] Some people drop question 9 (on suicide).

How should we deal with the FDEs associated with the responses to the questions?

Our current approach:

  1. the FDEs of Questions 1 - 8 of version-orig and version-deriv2 will be the same.
  2. the FDEs for total_score will be different for all 3 (since one is derived from 9 questions and the other from 8 questions, and one changes the accumulation time for the cognitive/memory process).
  3. we think the FDE for categorical label should also be different
  4. the FDEs of version-deriv1 questions are all new FDEs.

we think some comparisons are apples to apples, and some are gala apples to granny smiths. hence our current approach.

Would love to hear comments from folks.

1 Like

It seems to me that the question has more to do with FDE’s representation of different versions of the same questionnaire, and the distinguishing of aggregate scores that result from a combination of question answers.

In the case of a questionnaire that has many variations and versions, how are those variations being represented by the FDE for questionnaire?

In the case of individual questions that are associated with multiple versions of the questionnaire, what is the relation, e.g. part-of, that defines that link? And is there something special about that relation vis-a-vis questionnaires as distinct from other part-of relations?

In the case of individual questions that have multiple versions as they are iterated upon, how can those variations being capture by the FDE for individual question?

Am I going on an unrelated tangent, or are these questions at all relevant?

The concept of a Federated Data Element is to collect measures of the same thing (e.g. handedness) so that cohort discovery is made easier. There still might be issues of using some of the data (e.g. categorical handedness versus a scale based measure) - but that is dependent on the potential use of the data. If one changes the FDE for any change in the underlying common or personal data elements the FDE will just be another representation of the CDE.

The FDEs should allow comparison of different types of apples and the CDE mappings should allow comparison of the same type of apples.

@tedstrauss - i do think your questions are related.

@jgrethe - FDEs would always require provenance or additional metadata to resolve differences, right? but aren’t we saying that FDEs have consistent valuesets (numeric ranges, or category labels).

for example an FDE for “depression level” across all possible instruments that capture this (e.g., PHQ9, PHQ8, HAMD, …) may not be very useful by itself, unless we can map these scales to each other.

FDE_depression = T1(CDE_PHQ9_orig)
FDE_depression = T1(CDE_PHQ9_deriv1)
FDE_depression = T2(CDE_PHQ8)
FDE_depression = T3(CDE_HAMD)

where T1, T2, and T3 are transforms (which could be nonlinear).

It seems we should have the criteria that the concept value is comparable independent of how it was derived.

As a secondary example, let’s take the FDE for left caudate volume. The value could be computed by different processes, the caudate could be defined differently. As such the FDE is an abstraction over those details, but still comparable. Here we take the implicit notion that the combination of left caudate and volume measure is a reasonable merger instead of simply saying brain_volume_measurement and letting people figure out through secondary attributes what the measurement is for.

As a tertiary example we take handedness. The only FDE possible if we use the criteria for comparability would be FDE_handedness_categorical. The numerical version from something like the Edinburgh inventory would be a CDE and not an FDE.

@jgrethe - Do this examples make sense?

@tedstrauss - regarding part of we are currently using jsonld to connect common elements across assessments or their variants and prov to track derivations at the assessment level. the deriv1 variant is something we have not yet solved for a clean representation. if the response to question 1 is a CDE, then the CDE should be different if the context of question 1 (i.e. duration) changes. We are trying to figure out what relationship is used so we minimize having to copy all the questions.

Does the response and the question together make a CDE? or is it only the response part?

the CDE variable captures the response/value, but the details of the CDE are with respect to the details of the question/process.

CDEs can come from questionnaires, assessments, data processing, and others.

@satra - The FDEs would require provenance and the fidelity of the mapping may vary (we have previously discussed also including metadata on the quality of the mapping - this relates to your handedness example as well). I would agree with the caudate volume example. For Handedness I think a scale would still be possible - Edinburgh and Annet would map and the categorical question could map with a low quality indicator or metadata element that would outline the issue of converting from categorical to scale.