STARDUSTT Approach

Presentation

Watch Dr. Danielle Boyce present STARDUSTT-ALS — a new approach to capturing, managing, and analyzing “big data” in ALS natural history studies.

White Paper

The STARDUSTT Framework: Addressing the Curse of Knowledge in OMOP Implementation

Authors: Danielle Boyce^1,2,3, Pavel Goriacko⁴, Pamela Dasher⁵, William Roddy⁵, Wesley Anderson⁵, Smith F. Heavner^5,6

Corresponding Author: Danielle Boyce

Affiliations:
1. ALS Therapy Development Institute
2. Johns Hopkins University School of Medicine
3. Tufts University School of Medicine
4. Montefiore Einstein, Center for Health Data Innovations
5. Critical Path Institute
6. Clemson University, Department of Public Health Sciences

Executive Summary

Adopting the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) requires collaboration across diverse stakeholders—executives, clinical informaticists, and technical teams—each with different expertise. Too often, the curse of knowledge impedes effective communication, slowing progress.

The STARDUSTT Framework—Secure Data, Technology Awareness, Active Clinical Representation, Relational Databases, Data Quality, Utilization of Git, Standardized Vocabularies, Training & Documentation, Translation & Communication—offers a structured approach to build readiness and overcome these barriers.

Background

Large-scale OMOP CDM projects such as the National COVID Cohort Collaborative (N3C), All of Us, and EHDEN have created unprecedented opportunities for observational research. Yet, many sites face challenges due to: - Limited technical readiness - Gaps in shared understanding - Lack of role-specific training

One important contributor to these challenges is the curse of knowledge—a cognitive bias first described by Camerer, Loewenstein, and Weber (1989) in The Curse of Knowledge in Economic Settings: An Experimental Analysis. This bias occurs when individuals with expertise find it difficult to imagine what it is like for others not to possess that knowledge, often leading to overestimations of others’ understanding and ineffective communication.

In OMOP projects, this bias can manifest when experienced informaticists, engineers, or OHDSI community members assume that all team members are familiar with prerequisites such as relational databases, GitHub workflows, or the OHDSI tool stack. Without intentional, role-specific onboarding, new participants can become overwhelmed and disengaged.

Our work on the CURE ID project provided direct, practical experience with these challenges and informed the development of the STARDUSTT Framework as a structured approach to address them.

The STARDUSTT Framework

Figure 1: Overview of the STARDUSTT Framework

STARDUSTT Components

Secure Data — De-identification, privacy protection, and regulatory compliance.
Technology Awareness — Familiarity with Docker, SQL, and OHDSI tools.
Active Clinical Representation — Inclusion of clinicians in decision-making and workflow mapping.
Relational Databases — Understanding schema, joins, and OMOP table relationships.
Data Quality — Processes for completeness, accuracy, and plausibility.
Utilization of Git — Version control, collaboration, and issue tracking.
Standardized Vocabularies — SNOMED, LOINC, RxNorm, and use of Athena for mapping.
Training & Documentation — SOPs, MOPs, and curated learning paths.
Translation & Communication — Plain-language messaging, feedback loops, and artifacts for different personas.

Implementation Insights

The framework emerged from interviews with both successful and hesitant OMOP adopters. Three key findings shaped its design:

Tailor training by role: Executives, clinicians, and engineers have different needs.
Focus on prerequisites: Don’t assume baseline skills in databases, GitHub, or coding.
Use shared language: Maintain glossaries and avoid unnecessary jargon.

Common Problems and Strategies

Problem	Impact	Strategy
Varying expertise levels	Poor decision-making	Start with foundational training
Overwhelming resources	Loss of engagement	Curated, role-specific materials
Technical jargon	Miscommunication	Accessible language + shared glossary
Complex topics too soon	Frustration	Incremental learning
Passive learning	Low retention	Active exercises and case studies
Infrequent feedback	Persistent gaps	Regular check-ins
One-size-fits-all	Uneven outcomes	Diverse teaching methods
Poor documentation	Limited autonomy	Clear, versioned resources
Isolated learning	Slow problem-solving	Mentorship and peer learning
Theory without practice	Weak skill transfer	Real-world application
Static programs	Outdated skills	Continuous updates

Applying STARDUSTT to Personas

Component	C-Suite Executives	Clinical Informaticists	Data Engineers
Secure Data	Compliance and strategic value	IRB/PHI safeguards	Advanced security techniques
Technology Awareness	ROI of tools	OHDSI tools in clinical workflows	Deep technical training
Active Clinical Representation	Sponsor collaboration	Bridge to technical teams	Integrate clinical context
Relational Databases	Decision-making role	Basic querying	Dialect-specific scripting
Data Quality	Impact on trust	Use of DQD/Achilles	Automated DQ pipelines
Utilization of Git	Governance	Simplified workflows	Advanced branching & CI
Standardized Vocabularies	Interoperability strategy	Vocabulary use in practice	Mapping & Athena mastery
Training & Documentation	Executive dashboards	Guides & glossaries	Tutorials & peer reviews
Translation & Communication	Strategic framing	Clinical translation	Technical action from requirements

Personas in Practice

C-Suite Executives need strategic summaries, compliance implications, and ROI metrics.
Clinical Informaticists benefit from practical workflow examples and clear connections to patient care.
Data Engineers require technical depth and context to align builds with project goals.

Conclusion

The STARDUSTT Framework bridges gaps in knowledge, builds shared understanding, and accelerates OMOP adoption. By addressing role-specific needs and the curse of knowledge, it improves efficiency, reduces frustration, and increases sustainability in data harmonization projects.

Acknowledgements

We thank Irina Titkova, PhD (Show-cell.com, Barts Cancer Institute, Queen Mary University of London) for graphic design and Grace Zaikoski for editorial support.

References

Heavner SF, Anderson W, Kashyap R, Dasher P, Mathé EA, Merson L, Guerin PJ, Weaver J, Robinson M, Schito M, Kumar VK, Nagy P. A Path to Real-World Evidence in Critical Care Using Open-Source Data Harmonization Tools. Crit Care Explor. 2023;5(4):e0893. doi:10.1097/CCE.0000000000000893.
Camerer C, Loewenstein G, Weber M. The curse of knowledge in economic settings: An experimental analysis. J Polit Econ. 1989;97(5):1232–1254.
Hripcsak G, Mirhaji P, Low AF, Malin BA. Preserving temporal relations in clinical data while maintaining privacy. J Am Med Inform Assoc. 2016;23(6):1040–1045. doi:10.1093/jamia/ocw001.
Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS. 2016;4(1):1244.
Observational Health Data Sciences and Informatics (OHDSI). The Book of OHDSI. 2023. https://ohdsi.github.io/TheBookOfOhdsi/