Advantages and Disadvantages of Open Source Data Modeling Tools

Using open source data modeling tools has been a topic of debate as large organizations, including government agencies and financial institutions, are under increasing pressure to keep up with technological innovation to maintain competitiveness. Organizations must be flexible in development and identify cost-efficient gains to reach their organizational goals, and using the right tools is crucial. Organizations must often choose between open source software, i.e., software whose source code can be modified by anyone, and closed software, i.e., proprietary software with no permissions to alter or distribute the underlying code.

Mature institutions often have employees, systems, and proprietary models entrenched in closed source platforms. For example, SAS Analytics is a popular provider of proprietary data analysis and statistical software for enterprise data operations among financial institutions. But several core computations SAS performs can also be carried out using open source data modeling tools, such as Python and R. The data wrangling and statistical calculations are often fungible and, given the proper resources, will yield the same result across platforms.

Open source is not always a viable replacement for proprietary software, however. Factors such as cost, security, control, and flexibility must all be taken into consideration. The challenge for institutions is picking the right software, or mix of platforms, to streamline software development.

Advantages of Open Source Programs

The Cost of Open Source Software

The low cost of open source software is an obvious advantage. Compared to the upfront cost of purchasing a proprietary software license, using open source programs is a no-brainer. There is almost no cost since open source programs can be distributed freely (with some possible restrictions to copyrighted work). However, indirect costs can be difficult to quantify. Downloading open source programs and installing the necessary packages is easy and adopting this process can expedite development and lower costs. On the other hand, a proprietary software license may bundle setup and maintenance fees for the operational capacity of daily use, the support needed to solve unexpected issues, and a guarantee of full implementation of the promised capabilities. Enterprise applications, while accompanied by a high price tag, provide ongoing and in-depth support of their products. The comparable cost of managing and servicing open source programs that often have no dedicated support is difficult to determine.

Open Source Talent Considerations

Another advantage of open source is that it attracts talent who are drawn to the idea of sharable and communitive code. Students and developers outside of large institutions are more likely to have experience in open source since access is widespread and easily available. Developers in open source are free to experiment and innovate, gain experience, and create value outside of the conventional industry focus. This flexibility naturally leads to broader skilled inter-disciplinarians. As the chart below from Indeed’s Job Trend Analytics tool shows, there has been strong growth in open source talent, especially Python.

From an organizational perspective, the pool of potential applicants with relevant programming experience widens significantly compared to the limited pool of developers with closed source experience. For example, one may be hard-pressed to find a new applicant with development experience in SAS since comparatively few have had the ability to work with the application. Key-person dependencies become increasingly problematic as the talent or knowledge of the proprietary software erodes down to a shrinking handful of developers.

Job Seekers Interests via Indeed

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

Support and Collaboration

The collaborative nature of open source facilitates learning and adapting to new programming languages. While open source programs usually not accompanied by the extensive documentation and user guides typical of proprietary software, the constant peer review from the contributions of other developers can be more valuable than a user guide. In this regard, adopters of open source may have the talent to learn, experiment with, and become knowledgeable in the software without formal training.

Still, the lack of support can pose a challenge. In some cases, the documentation accompanying open source packages and the paucity of usage examples in forums do not offer a full picture. For example, RiskSpan built a model in R that was driven by the available packages for data infrastructure – a precursor to performing statistical analysis – and their functionality. R does not have an active support solutions line and the probability of receiving a response from the author of the package is highly unlikely. RiskSpan had to thoroughly vet packages considering the few resources in place.


Flexibility and Innovation

The flexibility of open source is also an attractive advantage. Python allows users to use different integrated development environments (IDEs), that have multiple different characteristics or functions, as compared to SAS Analytics, which only provides SAS EG or Base SAS. R makes possible web-based interfaces for server-based deployments. These functionalities grant more access to users at a lower cost. Thus, there can be more firm-wide development and participation in development. The ability to change the underlying structure of open source makes it possible to mold it per the organization’s goals and improve efficiency.

Another advantage of open source is the sheer number of developers trying to improve the software by creating many functionalities not found in their closed source equivalent. For example, R and Python can usually perform many functions like those available in SAS, but also have many capabilities not found in SAS: downloading specific packages for industry specific tasks, scraping the internet for data, or web development (Python). These specialized packages are built by programmers seeking to address the inefficiencies of common problems. A proprietary software vendor does not have the expertise nor the incentive to build equivalent specialized packages since their product aims to be broad as to suit uses across multiple industries.

At RiskSpan, we utilize open source data modeling tools and operating systems for data management, modeling, and enterprise applications. R and Python have proven to be particularly cost effective in modeling. R provides several packages that serve specialized techniques. For example, R boasts an archive of packages devoted to estimating the statistical relationship among variables using an array of techniques, which cuts down on development time. The ease of searching for these packages, downloading them, and researching their use incurs nearly no cost.

Open source makes it possible for RiskSpan to expand on the tools available in the financial services space. For example, a leading cash flow analytics software firm that offers several proprietary solutions in modeling structured finance transactions lacks the full functionality RiskSpan was seeking.  Seeking to reduce licensing fees and gain flexibility in structuring deals, RiskSpan developed deal cashflow programs in Python for STACR, CAS, CIRT, and other consumer lending deals. The flexibility of Python allowed us to choose our own formatted cashflows and build different functionalities into the software. Python, unlike closed source applications, allowed us to focus on innovating ways to interact with the cash flow waterfall.

Disadvantages of Open Source Programs

While users may have a conceptual understanding of the task at hand, knowing which tools yield correct results, whether derived from open or closed source, is another dimension to consider. Different parameters may be set as default, new limitations may arise during development, or code structures may be entirely different. Different challenges may arise from translating a closed source program to an open source platform. Introducing open source requires new controls, requirements, and development methods.

Redundant code is an issue that might arise if a firm does not strategically use open source. Across different departments, functionally equivalent tools may be derived from distinct packages or code libraries. There are several packages offering the ability to run a linear regression, for example. However, there may be nuanced differences in the initial setup or syntax of the function that can propagate problems down the line. In addition to the redundant code, users must be wary of “forking” where the development community splits on an open source application. For example, R develops multiple packages performing the same task/calculations, sometimes derived from the same code base, but users must be cognizant that the package is not abandoned by developers.

Users must also take care to track the changes and evolution of open source programs. The core calculations of commonly used functions or those specific to regular tasks can change. Maintaining a working understanding of these functions in the face of continual modification is crucial to ensure consistent output. Open source documentation is frequently lacking. In financial services, this can be problematic when seeking to demonstrate a clear audit trail for regulators. Tracking that the right function is being sourced from a specific package or repository of authored functions, as opposed to another function, which may have an identical name, sets up blocks on unfettered usage of these functions within code. Proprietary software, on the other hand, provides a static set of tools, which allows analysts to more easily determine how legacy code has worked over time.

Using Open Source Data Modeling Tools

Deciding on whether to go with open source programs directly impacts financial service firms as they compete to deliver applications to the market. Open source data modeling tools are attractive because of their natural tendency to spur innovation, ingrain adaptability, and propagate flexibility throughout the firm. Proprietary software, however, provides the support and hard line uses that may neatly fit within an organization’s goals. The considerations offered here should be weighed appropriately when deciding between open source and proprietary software.

Questions to consider before switching platforms include:

  • How does one quantify the management and service costs for using open source programs? Who would work on servicing it, and would that be costlier than utilizing a vendor?
  • In the long term, could costs be lower with open source? After the initial setup and foundation of security, support, and infrastructure, can open source result in an overall cost lower than licensing?
  • When might it be prudent to move away from proprietary software? In a scenario where moving to a newer open source technology appears to yield significant efficiency gains, when would it make sense to end terms with a vendor?
  • Does the institution have the resources to institute new controls, requirements, and development methods when introducing open source?
  • Does the open source application or function have the necessary documentation required for regulatory and audit purposes?

Open source is certainly on the rise as more professionals enter the space with the necessary technical skills and a new perspective on the goals financial institutions want to pursue. As competitive pressures mount, financial institutions are faced with a difficult yet critical decision of whether open source is appropriate for them. Open source may not be a viable solution for everyone; the considerations discussed above may block the adoption of open source for some organizations. However, often the pros outweigh the cons, and there are strategic precautions that can be taken to mitigate any potential risks.