Summary

A large number of companies in healthcare are using Databricks (which uses Apache Spark underneath) as their data platform. Databricks provides a unified analytics platform that enables data engineering, data science, and machine learning at scale, leveraging Apache Spark for big data processing and collaborative workspaces for teams.

However Databricks does not have built-in capabilities to handle FHIR. Unlike tabular data, FHIR is a nested JSON format with pre-defined schema. Customers would have to do a lot of work to leverage FHIR with Databricks today.

The b.well FHIR SDK for Databricks enables companies using or evaluating Databricks to map and process FHIR natively in Databricks.

The b.well Open Source FHIR SDK addresses several key challenges in healthcare data management, including:

Seamless Data Mapping: Facilitating the conversion between different data formats, particularly from tabular or flat file formats to nested JSON structures like FHIR, and vice versa.
Scalable but Simple Data Pipelines: Simplifying the creation of robust and scalable data pipelines using Apache Spark (including integration with Databricks) while removing much of the complexity of Apache Spark.
High-Performance FHIR Communication: Optimizing communication with FHIR servers for faster data retrieval and transmission.
Testing and Validation: Enabling thorough testing and validation of data processing logic through unit tests and gated check-ins.

This document provides comprehensive guidance on using the b.well Open Source FHIR SDK for mapping data to and from the FHIR format, as well as for efficiently interacting with the b.well high-performance FHIR server.

Additionally, this document offers detailed, step-by-step instructions on how to implement and utilize the b.well Open Source FHIR SDK within Databricks.

Why?

Several healthcare companies are utilizing Databricks (or Apache Spark) as their data platform. Databricks offers a comprehensive analytics platform that supports large-scale data engineering, data science, and machine learning, using Apache Spark for big data processing and collaborative workspaces for teams.

However, Databricks lacks built-in features to manage FHIR data. Unlike traditional tabular data, FHIR is a nested JSON format with a predefined schema.

The b.well FHIR SDK for Databricks allows companies that are using or considering Databricks to natively map and process FHIR data within the platform.