Navigating the Pitfalls of Third Party Data Integration
Jed Gordon, Partner & Global Head, IP Transactions Practice, McDermott Will & Emery
Jed Gordon – Edward (Jed) Gordon is a Boston-based intellectual property Partner and serves as the head of McDermott Will & Emery’s IP Transactions practice. He counsels clients in a variety of industries in relation to the intellectual property issues that arise in corporate transactions, including licensing arrangements, mergers, acquisitions, joint ventures and technology collaborations. Jed also advises on strategic patent portfolio development and in relation to patent enforcement strategies and patent infringement litigation defense. He has been recognized as a leading lawyer by IAM Patent 1000, The Legal500 US and Super Lawyers.
With the increasing dependence on data as a key component to products, companies have begun to rely on third party data sources to accelerate their product development efforts. Unfortunately, an appreciation of the legal framework governing use of third party data sources has not kept pace. The environment is reminiscent of the early days of open source software, during which many software developers saw the availability of open source software as a cost-free shortcut to product development, not realizing they could be introducing viral license terms into their code base. Discovery of “copyleft” open source code in commercial products has scuttled numerous M&A transactions and cost the software industry substantial sums in remediation efforts. Data-oriented companies should learn from the industry’s open source software experience to avoid similar pitfalls.
Integrating third party data into your product without a clear understanding of the terms under which the data was provided can introduce similar risks to a company as the incorporation of open source software. Proprietary datasets could become subjected to unwanted disclosure obligations. Sale of product could result in claims of copyright infringement or a breach of a license agreement. Moreover, once third party data is incorporated into a product, extracting that data or undoing its use could incur substantial costs and delays in product launches.
As a starting point, in-house counsel and corporate information officers should recognize that databases are generally protected by intellectual property rights. These rights may take the form of copyright protection in some jurisdictions, while other jurisdictions offer sui generis database protections. Even if a database creator makes their database publicly available for download or access, one’s rights to use, modify, copy, and distribute those data sets to third parties is limited to the rights and restrictions included in a license associated with the initial access of the data. One common and problematic limitation on usage of data made available by academic institutions is the prohibition of using the data for commercial purposes. While this clearly would prohibit resale of the data as a product, it also would arguably prohibit directly incorporating the data source into a commercial product or indirectly incorporating it, such as in training a machine learning model. Other common limitations are prohibitions on making derivative works of a product or requirements to include attribution to the data set providers in any redistribution. Unless these license requirements are tracked, inadvertent violations can easily result.
With the increasing dependence on data as a key component to products, companies have begun to rely on third party data sources to accelerate their product development efforts
Accordingly, a robust data source tracking process is recommended for any product that is built upon third party data sets. The process should track limitations on how the data can be used, what attributions may be required and whether any fees or royalties may be due to a data provider in connection with its use. The process should also track a product’s compliance with these rights and obligations. As with many open source software components which are freely licensed under “copyleft” license terms, data sets which appear to be available on problematic license terms can often also be obtained under commercial licenses that avoid these risks. By rigorously vetting third party data source license terms and product usage of such data, publicly available data sets can be the boon to data-based products and services in much the same way open source has benefitted the broader software industry.
This content is copyright protected
However, if you would like to share the information in this article, you may use the link below: