At some point in time, all users of social networking applications such as Facebook or LinkedIn have been pleasantly surprised at seeing an old school friend, or an ex-colleague from that first job, pop up in the ’people you may know’ recommendations. The tech-savvy ones among us would be able to guess that these recommendations “… work off of common connections … especially if two people have the same set of friends …”. However, if you’ve wondered about the technology behind this, then read on. Social networking applications use graph technology to store and process vast amounts of data.
Simply put, a graph is a way to store and represent “connected” data. For example, let’s consider a patient who visits a doctor and subsequently a pharmacy. This interaction can be represented in the form of a triangle, where the vertices (referred to as “nodes”) represent the patient, doctor and pharmacy, and each side (referred to as “edges”) represents a connection between the trio.
This same interconnected structure can be used to represent data for hundreds of thousands of patients, doctors and pharmacies in a healthcare graph. Graph databases are thus very useful when extremely large volumes of disparate but connected data sources, both structured and unstructured, have to be consolidated, processed and analyzed in real-time.
Once data is represented in the form of a graph, it allows us to analyze and identify insights in ways that are not possible when the data is stored in other formats (e.g. tabular). Such analysis can be done in two ways, either separately or in a combined manner:
- Using sophisticated “graph algorithms” that can mine the graph to return specific insights
- Using “graph visualization tools” which allow analysts to perform “graph discovery” i.e. uncover useful connections and patterns to solve business problems
In the rest of this article, we’ll take a brief look at certain use-cases, where graph technology can enable companies in their quest to become customer-centric.
Precision medicine
Graph networks are a topic of intense research in the area of precision medicine, where researchers try to understand disease-drug-gene-patient interactions. For example, multiple topic-specific networks could be constructed as follows:
- A drug network, where each node represents a drug and the connection between nodes could be based on common compounds, similar diseases, reactions, etc.
- A disease network, where each node represents a disease and connections are based on the similarity of symptoms
- A gene network, where nodes represent individual genes and connections are based on proteins
- A patient network, where patients could be connected based on family relationships
Once these topic-specific networks are created, they are connected with each other to help answer several useful questions: How does a specific drug interact with a certain target patient, based on drug and gene networks? Will the patient suffer adverse reactions? Graph algorithms can traverse such massively interconnected graphs and perform automated reasoning to reveal insights that help researchers answer these questions and more.
Drug retargeting
One of the key questions that has come up during the race to discover a vaccine for COVID-19 is if certain existing drugs can be repurposed to fight the novel coronavirus. This approach is called drug retargeting or repositioning. Drug retargeting is useful when it comes to orphan diseases i.e. diseases that afflict only a very small number of people.
This means that new drug research for these orphan diseases is not always a priority for pharma companies. In such situations, given that most pharmaceutical drugs act by targeting proteins (enzymes, receptors etc. are proteins), research could be expedited through a drug retargeting graph, constructed with three types of nodes: disease, protein and drug nodes. A connection is created between a disease and a protein, where it is known that a problem with the protein causes the disease.
Likewise, a connection is created between a drug and a protein, where the protein is a known target for the drug. By analyzing such a graph with multiple types of nodes, new target proteins for existing drugs can be identified, consequently discovering a new treatment for diseases base on clinical trials.
Genomics and disease-risk prediction
Certain kinds of diseases are known to be caused by multiple genes, referred to as “disease genes”. Understanding the connection between a gene and a disease is important to help diagnose and treat the disease. However, it is not easy to identify the disease-gene association. Graph methods can help overcome this challenge by leveraging two ideas:
- Diseases often are related and occur together, called as comorbid diseases.
- Genes are often related based on the proteins they contain.
So, analyzing a graph constructed using both diseases and genes – with connections between comorbid diseases, related genes and known disease-gene associations – could help uncover new disease-gene associations. Such knowledge could be used to predict the possibility of a patient suffering from a certain disease in the future and come up with suggestions on lifestyle interventions in the present.
Fraud detection
Health insurance companies focus on fraud for multiple reasons. Detecting fraudulent claims and not paying them helps keep medical costs down, which in turn leads to lower increase in premium rates. Moreover, some treatments, tests and medical equipment have a quarterly or annual limit. If a fraudulent claim is filed in the name of the customer by nefarious parties without the customer’s knowledge, the specific treatment or test or equipment cannot be availed of by the customer when they need it.
Graph technology helps health insurance companies go beyond traditional fraud detection methods that involve looking at a single claim or a single doctor. Graph methods now allow looking at the entire network of doctors to detect collusion, cartelization, kickbacks, fake referrals, etc.
Graph databases for EHR
The electronic health record, or EHR, is a digital compendium of a patient’s medical history, treatment protocols, lab tests and results, etc. The EHR contains health information that is created by doctors and hospitals, and permits sharing of the same. One of the key benefits of EHR is the instantaneous availability of information to doctors as they treat patients.
However, over the years, as EHR adoption has increased, both the type and volume of data has increased manifold, which has exposed some of the barriers, namely lack of interoperability, poor usability and high costs. Usability includes system response time. Costs include storage costs for the large volume of data, both of which are problems for the conventional databases used by EHR vendors.
Graph databases thrive in these situations where a large volume of structured and unstructured data needs to be stored and retrieved rapidly at low costs. This has led to a growing trend towards using graph databases to store the data behind modern EHR applications.
Conclusion
When it comes to adoption, graph is a sticky enterprise-level technology choice that requires the right CXO level sponsorship (to help drive data consolidation), the right choice of technology vendor partner (given the varying tradeoffs in commercially available graph database options) and the right team (given that the talent pool with the right knowhow is limited).
However, what makes the investment worthwhile is the payoff in terms of the kinds of problems that become tractable with graph technology. It is precisely this kind of potential that offers opportunities for architects, engineers and data scientists, who are looking to gear themselves up for the next wave that graph technology brings with it.
Whether it is learning how to architect an enterprise-level graph database, or understanding how to devise embeddings using knowledge graphs for natural language processing problems, or even knowing how to use graph query language to query data stored as a graph, there are useful in-demand skills that the workforce can look to equip themselves with and do their life’s best work.
By Rajesh Sabapathy, Director of Data Science, Optum Global Solutions (India) Pvt. Ltd