Financial Data Management Yi Wang Morningstar, Inc. My topic today is about financial data management. First of all, I’d like to give you a quick introduction of Morningstar.
Morningstar is a leading global provider of independent investment research. We have operations in 26 countries.
Individual investors served worldwide Financial advisors 7.4 Mil 270,000 4300 400,000 Individual investors served worldwide Financial advisors Institutional clients Investment offerings Supporting 7.4 mil individual investors; 270K financial advisors, 43 hundred institutional clients Morningstar provides data on approximately 400,000 investment offerings, including stocks, mutual funds, and similar vehicles, along with real-time global market data on more than 5 million equities, indexes, futures, options, commodities, and precious metals, in addition to foreign exchange and Treasury markets. Our business runs on data
What is a time series? TP: With so many different types of financial data, my focus of the day is on time series What’s a time series? A time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals 3) Time series has natural temporal ordering
Usage of a time series? TP: A stock price chart is a typical example of where time series is being used. Audience with statistic and math background may start to think of the stationary, serial correlation, moving averages …. <next slide> and other modeling techniques .
TP: … and other modeling techniques in order to extract meaningful characteristics and to forecast what the future may be like.
Morningstar time series database Coverage 1984: First print product; 400 mutual funds 1991: First electronic product; 2,300 mutual funds; 10,000 time series Now: Multiple line of products; 143,000 mutual funds; over 100 million time series Variety of data Intra-day price Equity fundamentals Fees, expenses Economic series Weather … Here is brief overview of what we have in Morningstar time series database At the time we published our first product in 1984 Intra-day price Equity fundamentals Fees, expenses Economic series Weather …
Challenges & Solutions Having such vast amount of time series data in our database enables us to provide powerful research capabilities, but in the mean time poses challenges in various areas, and I’d like to share with you here the major challenges we encountered and solutions we’ve explored
Collection and processing Challenges: Multiple data source Identification Consistency Labor intensive efforts Solutions: Intelligent data consolidation Dependency awareness Starting from data collection and processing, the first challenge we encounter is we got data from multiple data sources, so Which is the right copy and How to aggregate is the first question we need to answer Since investment identifiers we get can change from time to time, and varies by providers, we also need to figure out is how to link information for the same time series together to our permanent identifier As most of the time series in our system are derieved from one or multiple raw data series, how to keep all the related time series consistent when there are corrections to the raw data is another area we need to look into. Obviously, with the massive nature of time series data, collecting and maintain a quality database requires a large amount of work We focus in two areas for our solution: building a intellegent data consolicataion collection system, and create dependency consicious processing mechanism
Storage and dissemination Challenges: Latency and throughput requirement Deliver data to meet different demands (with low latency) Solution: System designed for time series When the time comes to store and deliver data, we have to deal with the latency and throughput requirement and how to deliver data to meet different demands, because same data need to be delivered in different format and delivery mechanism Size: Accessing speed Network bandwidth Delivery challenge Granularity Format Delivery metho We researched a lot of existing solutions but none seemed to be meeting our need and ended up developing an inhouse solution.
Globalization Challenges: Regulatory Standardization Data representation Solution: Market and culture sensitive How did we do it when expanding our data coverage from US based to Global? How do we support localization with a global perspective? The first area we need to look into is regulatory, for example, when European countries started to adapt Euro, how to handled the different adpation time of euro for each country, how to support pre euro currencies that no longer in existence Aside from regulatory concerns, standization is another area that requires a lot of consideration. Aside from what we commonly know about standardiization, which is finding the commonality among information that presented in different ways in different location, we also need to be aware of what not to standardize, because standardining information that appears to be same will compromise data quality and interpretation. Lastly Data presentation: unit of data, different format of same data when presented in different country are examples we have to take into account TP: With offices & service in 26 country, how to we during to localization with a global perspective? European currency change Time zone Expectation on Data quality
A closer look In the next couple of slides, I’ll show you how we put all the afore-mentioned solutions together in our system using market price as example
For when the price reaches us through exchanges till it gets to products like this
Data Consolidation Mapping rule User interface Quality rule Morningstar products Data Source Quality rule Clean up rule Merge rule User interface Machine learning Mapping rule Data Consolidation NASDAQ Time series system Data vendor Calculation When price for a company, say Apple, is delivered to us through exchange, we… Next, let’s take a closer look at how our proprietary system works once time series data is stored in it.
Time series engine Data Interface Data filter Data assembler Data formatter Query Engine Market adapter Time aggregator Localization adapter At the bottom level, we have the storage department that archives and indexes the data values as well as meta data. It was built on a distribution storage scheme on cloud platform to optimize data compression. In the middle is where most of our intelligence… that handle all the custom query and optimization, as well as business rule On top is where all the formalization and transformation happens before the data is presented to the user. Time series is definitely an interesting and unique data that is grant itself an independent place for it to be discussed. I’m sure many of you may have a different take on it than us, and it would be great if we can discuss further Storage Metadata Time series content