Models based on machine learning are being increasingly adopted by the finance community in general and the mortgage market in particular. The use of modeling and data analytics has been key in the turnaround of this market; however, anyone who has worked with mortgage loan data knows it is notorious for errors and data gaps. Despite industry-wide efforts to incorporate robust quality control programs, challenges with mortgage data persist. Fortunately, combining machine learning in finance with cloud computing (Google's Tensorflow and Amazon Web Services are examples) shows promise in addressing mortgage data gaps and producing more accurate results than traditional approaches.


Machine learning is a form of artificial intelligence in which computers have the ability to learn without being explicitly programmed. It utilizes advanced statistical methods to manage big and high dimensional data in order to detect patterns and behaviors. Machine learning uncovers otherwise hidden insights by detecting patterns without clear direction. Machine learning also allows the algorithm to adapt and change independently as new data surfaces.

Machine learning has already been adopted in our everyday lives. Facebook, for example, uses machine learning in its algorithms to adjust newsfeeds as new browsing patterns are detected.1 Online recommendation models like Amazon’s and Netflix’s are also based in machine learning; they predict future habits based on past behaviors and get better at predicting as more data is gathered. Self-driving cars, the latest craze in artificial intelligence, are yet another example of machine learning in action.2

Adoption of Machine Learning in Finance

Machine learning and artificial intelligence are fairly recent entrants into financial services. Their influence, however, is poised to expand significantly in 2017. Wells Fargo recently announced the formation of a new Artificial Intelligence Enterprise Solutions team focused on AI and its impact on the future of its business and its customers. JP Morgan employs a machine learning software, COIN,3 to interpret commercial-loan agreements in seconds. More firms are expected to follow this trend. Nasdaq CEO Adena Friedman predicts that machine learning is the one technology that will change the industry the most.4

To explore the business application of machine learning in enhancing the quality of mortgage loan data, RiskSpan's Sudhakar Chelikani, PhD, teamed up with Yale University professor, Hemant D. Tagare, PhD. Although addressing gaps for mortgage loan data may seem like a niche, quality assurance issue, as we near completion of our proof of concept study, we are convinced it’s the ideal place for an institution to start. The application of machine learning to mortgage data is intuitive, relatively easy to implement, and can be quite impactful. Further, this business application is a natural extension of existing business processes focused on data validation and normalization.

Using Machine Learning to Overcome Unreliable Data

The persistence of unreliable data renders accompanying analysis and model results suspect and may lead to business decisions that are based on incomplete analysis or misleading conclusions. Financial institutions routinely base critical decisions on model results, which in turn depend on reliable data. These decisions shape policy, guide portfolio management, and direct pricing and trading of loans and securities. Models that predict loan defaults, losses, and prepayments are only as good as their data.

In the current state, analysts are frequently tempted to rationalize the impact of bad modeling data and develop short-cuts to mitigate it. Perhaps the erroneous data, in the context of an enormous dataset, is immaterial to the overall results. Perhaps analysts can “paper over” missing LTV ratios or credit scores by filling them in assuming straight averages or medians that are reflective of the whole portfolio.

These optimistic approaches are fraught with peril. What if the missing data is concentrated in a particular region or time period that is not representative of the overall average? What if the gaps are concentrated in a particularly unrepresentative acquisition portfolio, exposing mortgage servicers to entire portfolio segments that are the result of poor management by a prior servicer or sloppy origination practices? Machine learning has the potential to identify and overcome all of these issues.  

Machine Learning: Beyond the Data

While critical, getting the data right is just the tip of the iceberg. RiskSpan is also investigating the use of machine learning to validate model results and perform continuous model monitoring, a regulatory requirement that is a struggle for many institutions.

RiskSpan is also researching the use of machine learning to produce an automated attribution analysis. Often a laborious, time-consuming process, attribution analysis typically involves reviewing daily changes in model results (valuation or risk measures) and generating exception reports when results fall outside of some pre-defined risk tolerance. Exceptions are handed to an analyst who analyzes the variance and manually creates a report indicating which underlying variables or economic factors may have caused the breach.

Applied machine learning techniques and cloud computing can streamline all of this on a scale that was cost prohibitive just a few years ago. Although our application is in the proof-of-concept stage, we expect to assist several institutions in implementing their own POCs this year. 

We will publish a white paper that explains our findings and specific machine learning techniques applied as we explore the potential benefits of machine learning to modeling and analytics in our markets.

Want to be notified when our findings are released?