Advantages and Disadvantages of Open Source Data Modeling Tools

Using open source data modeling tools has been a topic of debate as large organizations, including government agencies and financial institutions, are under increasing pressure to keep up with technological innovation to maintain competitiveness. Organizations must be flexible in development and identify cost-efficient gains to reach their organizational goals, and using the right tools is crucial. Organizations must often choose between open source software, i.e., software whose source code can be modified by anyone, and closed software, i.e., proprietary software with no permissions to alter or distribute the underlying code.

Mature institutions often have employees, systems, and proprietary models entrenched in closed source platforms. For example, SAS Analytics is a popular provider of proprietary data analysis and statistical software for enterprise data operations among financial institutions. But several core computations SAS performs can also be carried out using open source data modeling tools, such as Python and R. The data wrangling and statistical calculations are often fungible and, given the proper resources, will yield the same result across platforms.

Open source is not always a viable replacement for proprietary software, however. Factors such as cost, security, control, and flexibility must all be taken into consideration. The challenge for institutions is picking the right mix of platforms to streamline software development.  This involves weighing benefits and drawbacks.

Advantages of Open Source Programs
 

The Cost of Open Source Software

The low cost of open source software is an obvious advantage. Compared to the upfront cost of purchasing a proprietary software license, using open source programs seems like a no-brainer. Open source programs can be distributed freely (with some possible restrictions to copyrighted work), resulting in virtually no direct costs. However, indirect costs can be difficult to quantify. Downloading open source programs and installing the necessary packages is easy and adopting this process can expedite development and lower costs. On the other hand, a proprietary software license may bundle setup and maintenance fees for the operational capacity of daily use, the support needed to solve unexpected issues, and a guarantee of full implementation of the promised capabilities. Enterprise applications, while accompanied by a high price tag, provide ongoing and in-depth support of their products. The comparable cost of managing and servicing open source programs that often have no dedicated support is difficult to determine.
 

Open Source Talent Considerations

Another advantage of open source is that it attracts talent who are drawn to the idea of sharable and communitive code. Students and developers outside of large institutions are more likely to have experience with open source applications since access is widespread and easily available. Open source developers are free to experiment and innovate, gain experience, and create value outside of the conventional industry focus. This flexibility naturally leads to more broadly skilled inter-disciplinarians. The chart below from Indeed’s Job Trend Analytics tool reflects strong growth in open source talent, especially Python developers.

From an organizational perspective, the pool of potential applicants with relevant programming experience widens significantly compared to the limited pool of developers with closed source experience. For example, one may be hard-pressed to find a new applicant with development experience in SAS since comparatively few have had the ability to work with the application. Key-person dependencies become increasingly problematic as the talent or knowledge of the proprietary software erodes down to a shrinking handful of developers.

Job Seekers Interests via Indeed

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

*Indeed searches millions of jobs from thousands of job sites. The jobseeker interest graph shows the percentage of jobseekers who have searched for SAS, R, and python jobs.

Support and Collaboration

The collaborative nature of open source facilitates learning and adapting to new programming languages. While open source programs are usually not accompanied by the extensive documentation and user guides typical of proprietary software, the constant peer review from the contributions of other developers can be more valuable than a user guide. In this regard, adopters of open source may have the talent to learn, experiment with, and become knowledgeable in the software without formal training.

Still, the lack of support can pose a challenge. In some cases, the documentation accompanying open source packages and the paucity of usage examples in forums do not offer a full picture. For example, RiskSpan built a model in R that was driven by the available packages for data infrastructure – a precursor to performing statistical analysis – and their functionality. R does not have an active support solutions line and the probability of receiving a response from the author of the package is highly unlikely. This required RiskSpan to thoroughly vet packages.

 

Flexibility and Innovation

Another attractive feature of open source is its inherent flexibility. Python allows users to use different integrated development environments (IDEs) that have multiple different characteristics or functions, as compared to SAS Analytics, which only provides SAS EG or Base SAS. R makes possible web-based interfaces for server-based deployments. These functionalities grant more access to users at a lower cost. Thus, there can be more firm-wide development and participation in development. The ability to change the underlying structure of open source makes it possible to mold it per the organization’s goals and improve efficiency.

Another advantage of open source is the sheer number of developers trying to improve the software by creating many functionalities not found in their closed source equivalent. For example, R and Python can usually perform many functions like those available in SAS, but also have many capabilities not found in SAS: downloading specific packages for industry specific tasks, scraping the internet for data, or web development (Python). These specialized packages are built by programmers seeking to address the inefficiencies of common problems. A proprietary software vendor does not have the expertise nor the incentive to build equivalent specialized packages since their product aims to be broad enough to suit uses across multiple industries.

RiskSpan uses open source data modeling tools and operating systems for data management, modeling, and enterprise applications. R and Python have proven to be particularly cost effective in modeling. R provides several packages that serve specialized techniques. These include an archive of packages devoted to estimating the statistical relationship among variables using an array of techniques, which cuts down on development time. The ease of searching for these packages, downloading them, and researching their use incurs nearly no cost.

Open source makes it possible for RiskSpan to expand on the tools available in the financial services space. For example, a leading cash flow analytics software firm that offers several proprietary solutions in modeling structured finance transactions lacks the full functionality RiskSpan was seeking.  Seeking to reduce licensing fees and gain flexibility in structuring deals, RiskSpan developed deal cashflow programs in Python for STACR, CAS, CIRT, and other consumer lending deals. The flexibility of Python allowed us to choose our own formatted cashflows and build different functionalities into the software. Python, unlike closed source applications, allowed us to focus on innovating ways to interact with the cash flow waterfall.
 

Disadvantages of Open Source Programs

Deploying open source solutions also carries intrinsic challenges. While users may have a conceptual understanding of the task at hand, knowing which tools yield correct results, whether derived from open or closed source, is another dimension to consider. Different parameters may be set as default, new limitations may arise during development, or code structures may be entirely different. Different challenges may arise from translating a closed source program to an open source platform. Introducing open source requires new controls, requirements, and development methods.

Redundant code is an issue that might arise if a firm does not strategically use open source. Across different departments, functionally equivalent tools may be derived from distinct packages or code libraries. There are several packages offering the ability to run a linear regression, for example. However, there may be nuanced differences in the initial setup or syntax of the function that can propagate problems down the line. In addition to the redundant code, users must be wary of “forking” where the development community splits on an open source application. For example, R develops multiple packages performing the same task/calculations, sometimes derived from the same code base, but users must be cognizant that the package is not abandoned by developers.

Users must also take care to track the changes and evolution of open source programs. The core calculations of commonly used functions or those specific to regular tasks can change. Maintaining a working understanding of these functions in the face of continual modification is crucial to ensure consistent output. Open source documentation is frequently lacking. In financial services, this can be problematic when seeking to demonstrate a clear audit trail for regulators. Tracking that the right function is being sourced from a specific package or repository of authored functions, as opposed to another function, which may have an identical name, sets up blocks on unfettered usage of these functions within code. Proprietary software, on the other hand, provides a static set of tools, which allows analysts to more easily determine how legacy code has worked over time.
 

Using Open Source Data Modeling Tools

Deciding on whether to go with open source programs directly impacts financial services firms as they compete to deliver applications to the market. Open source data modeling tools are attractive because of their natural tendency to spur innovation, ingrain adaptability, and propagate flexibility throughout a firm. But proprietary software solutions are also attractive because they provide the support and hard-line uses that may neatly fit within an organization’s goals. The considerations offered here should be weighed appropriately when deciding between open source and proprietary data modeling tools.

Questions to consider before switching platforms include:

  • How does one quantify the management and service costs for using open source programs? Who would work on servicing it, and, once all-in expenses are considered, is it still more cost-effective than a vendor solution?
  • When might it be prudent to move away from proprietary software? In a scenario where moving to a newer open source technology appears to yield significant efficiency gains, when would it make sense to end terms with a vendor?
  • Does the institution have the resources to institute new controls, requirements, and development methods when introducing open source applications?
  • Does the open source application or function have the necessary documentation required for regulatory and audit purposes?

Open source is certainly on the rise as more professionals enter the space with the necessary technical skills and a new perspective on the goals financial institutions want to pursue. As competitive pressures mount, financial institutions are faced with a difficult yet critical decision of whether open source is appropriate for them. Open source may not be a viable solution for everyone—the considerations discussed above may block the adoption of open source for some organizations. However, often the pros outweigh the cons, and there are strategic precautions that can be taken to mitigate any potential risks.