As director of Big Data Quantitative Research for the Thomson Reuters StarMine Quantitative Analytics product line, I sometimes feel a bit like a quant customer of Thomson Reuters myself. That’s because StarMine – like our hedge fund clients – is constantly searching for untapped sources of “alpha,” or “excess return” against a fund’s benchmark.
The text mining dilemma and Intelligent Tagging
Text mining often leads us to novel content sets outside of the Thomson Reuters ecosystem. In these cases, in addition to all the fancy natural language processing and machine learning needed to distill text into a numerical signal, we also have to grapple with more basic challenges such as: What company is the text primarily about? And: How can we map the company to a returns time series so we can back-test our trading strategy?
Intelligent Tagging is a particularly helpful tool for connecting external text sources to the wealth of data sets within the Thomson Reuters ecosystem. In fact, when you consider all the possibilities of tagged and linked data, the types of studies we could conduct are only limited by our imagination.
This satisfies our first objective of figuring out which company the document is about; but as there is no PrimaryRIC listed for “Burger King Corporation,” we aren’t able to immediately identify what stock to trade if we wish to do so. However, Intelligent Tagging can help, as when we look at the “Top Most Public Parent” section, we can see “Restaurant Brands International Inc.” is tagged with a PermID of 5043951565 and a PrimaryRIC of QSR.TO.
This useful tagging can also be applied to micro-sized text such as this tweet about Kendall Jenner visiting Burger King:
Once again, Intelligent Tagging provides the Top Most Public Parent of “Restaurant Brands Inc.” along with its PermID and PrimaryRIC.
The RIC alone could help you connect to a returns series for your back-testing needs. You could also use the new PermID tables in QA Direct (centered around PermSecMapX) to hook into QA Direct’s symbology tables (SecMapX/GSecMapX) to map to InfoCode so you can connect to the DataStream’s ReturnIdx tables. By the way, all these mapping tables are point-in-time, which I know is quite important to anyone who wishes to produce a back-test without the issues of survivorship bias presented by some historical data sets that are missing companies which no longer exist today.
A new way to gauge the influence of celebrities
As people are tagged by Intelligent Tagging, as well, perhaps we could study the influence of celebrities on stock price. Does a celebrity affect stock price for any company they share a strong co-occurrence with? Or do celebrities only affect stock price for companies in a particular industry? Are there any celebrities who actually bring the stock price down? If we understand the influencers, then maybe reading celebrity gossip could yield a profitable trading strategy!
The bottom line is — Intelligent Tagging will tag text documents intelligently. That means the correct company and tradable stock are tagged, making it possible to link to pretty much any other traditional financial content set out there today. This means you are not throwing away “untaggable” documents, providing opportunities for more robust quant studies and a stronger model.
Can your trading strategy benefit from Intelligent Tagging?
Currently, text mining – the process of distilling high-quality information from text – is a particularly popular method for finding alpha in the quant world, including for us at StarMine. To complement our text mining, we also use structured content sets from Thomson Reuters, mainly those provided by QA Direct (Thomson Reuters normalized financial database with pricing; fundamentals; estimates; ownership; environmental, social and governance (ESG) criteria; and index data).
We also tap into the same unstructured (free form) text feeds our customers consume, such as news, broker research, financial filings, transcripts and patents. Thankfully, these unstructured text sources come with structured metadata describing what companies are mentioned in each document.
And with the help of Thomson Reuters Open PermID (a machine readable, 64-bit number used to create a unique reference – which will never change over time –to a piece of information) and the mapping tables within QA Direct, we are able to tie all those data sets together so we can create quant signals and back-test their efficacy by simulating a trading strategy on historical data to see how it would have performed. In turn, at StarMine use this combined data to explore, research and deliver robust quantitative finance models.