The use of Knowledge Graph technology to uncover connections within and across data sets has major potential in financial markets. We met with senior data scientists to discuss how to encourage wider adoption.
- Applying Natural Language Processing and Knowledge Graph technology to data sets can uncover previously hidden connections and insights.
- Companies are a gold mine of information, but many find it near impossible to access and connect data from across different departments and regions.
- The Thomson Reuters Knowledge Graph feed helps with data relationships, discovery and exploration needs across a range of business requirements.
In today’s complex digital world, the ability to organize and establish links between diverse data types has the potential to solve real business challenges across the financial industry.
The growth in volume and varieties of data is prompting firms across all industries to look for automated ways to generate insight and decisions from this data.
One way to achieve this is by applying Natural Language Processing (NLP) and Knowledge Graph technology to data sets, in order to label, tag and present information (particularly difficult to manage textual data sets) to uncover previously hidden connections and insights.
Graph databases are the fastest growing category across data management, according to consultancy DB-Engines, and they are predicted to power more and more emerging technology as well as being “an excellent solution for better and faster real-time big data analytical queries”.
Indeed, according to Forrester, 51 percent of global data and analytics technology decision makers are either “implementing, have already implemented, or are upgrading their graph databases.”
Thomson Reuters is committed to helping our customers understand and adopt emerging technology, like graph databases, to stay competitive.
And so to further our efforts to understand the challenges our customers are facing and how we could work together to find solutions, we invited senior data scientists from our biggest customers to Imperial College London to discuss the practical application of Knowledge Graphs and NLP for financial markets.
We explored use cases, learned about where our customers are on their Knowledge Graph journey, and shared insights on management of our Enterprise Data Warehouse.
Living up to the AI hype
With all the buzz surrounding Artificial Intelligence (AI) and its potential to streamline workflows and uncover new insights, the data scientists in the room highlighted their struggle with inflated expectations from across their respective companies.
They are often given project briefs that require integrating data sets that are simply not designed to be integrated, or proving connections between data sets that are too disparate.
The data scientists said that management and education of stakeholders was required to demonstrate the realities of implementing an AI project, i.e. that there was no ‘simple fix’ or ‘package’ to buy that would do the job.
The reality behind any AI project is that there is a long tail of work to be done in the lead up (e.g. sourcing the data), implementing it with the help of a cross-functional team across the business, and an ongoing process of learning, development and investment afterwards in order to ensure long-lasting success.
Waning stakeholder enthusiasm
Our attendees explained that, perhaps because of the inflated expectations businesses hold regarding this emerging technology, once the complex requirements of a project were shared with stakeholders, enthusiasm often waned.
To quote one customer: “Everyone likes the idea of it, but no one actually wants to do it.”
This is a particular frustration, as data science teams need the support and collaboration of teams across the business to source, manage and connect the data needed for a project.
The first and biggest challenge was gathering the data necessary to create the initial corpus.
We heard that it can take years to navigate the corporate bureaucracy of larger financial institutions, which causes many projects to grind to a halt or results in their complete cancellation
One attendee described their company as a “gold mine of information”, rich with potential for creating a Knowledge Graph, but they were finding it almost impossible to access and connect data from across different departments and regions.
Constant changing regulatory landscapes and requirements only exacerbates this struggle.
Difficulties with labeling data
Once data has been sourced, the next challenge our customers face is finding efficient ways to label it accurately.
Some have experimented with crowdsourcing and web scraping tools to do this, but were doubtful about the quality of the results.
One attendee shared that they were dealing with data that was up to 70 percent badly labeled, with people checking only 1 in 25.
But the time and expertise required to manage this process by themselves pushed project costs beyond budgets, and so they have had to consider what criteria/limits of accuracy they were willing to work within, in order to run their semantic web programs.
A further challenge was finding and establishing consistent identifiers, or definitions, to ensure they would be viable and could be found over time.
Linking point-in-time definitions involved tracking the evolution of topic classifications across time, for example ‘Brexit’ to ‘Frexit’, and then extracting these consistent definitions that connect to relevant data.
Graph partitioning and permissioning
Even in the cases we discussed where data was adequate to start building into Knowledge Graphs, the challenge of ensuring data governance and entitlements emerged.
How could companies balance the desire for a linked database with the requirement to enforce restrictions to stay compliant?
This requires a partitioning or the overall Knowledge Graph of both internal and external (third-party) data, and restricting the permissions of end-users to access each part.
We imagine this concern will become increasingly prominent, and so predict a corresponding rise in our graph-partners adding such partitioning capabilities in the coming months and years (for example, Neptune, Neo4J etc).
We’ve recently noticed that one of our potential graph partners (Tiger Graph) has added a new ‘MultiGraph’ feature that can offer organizations control over which parts of the graph a user can access in order to maintain security and data integrity.
Knowledge Graph examples
Although the majority of the attendees were still at the ‘educating’ and ‘exploring’ stages of Knowledge Graph projects, a few examples of application discussed were discussed.
We heard about a project that investigated difficulties with a new client-facing chat bot, which was achieving very low satisfaction rates.
The customer found that it was not the chat bot technology itself, but the complexity of the underlying knowledge base — spread over thousands of PDFs and various regions with differing rules of access — meaning it was near impossible for the computer to manage and draw out answers.
The data couldn’t be fed into a traditional relational database, but needed a graph database to help find the connections and make sense of the data.
Keeping the conversation going
We’re keen to keep this conversation going and would like hear if you are experiencing similar challenges and opportunities to the above?
The Thomson Reuters Knowledge Graph feed can help with your data relationships discovery and exploration needs across a range of business requirements, from investment research, to business development through to sales intelligence and financial crime and risk.