Because of my long-standing association with the Apache Software Foundation, I’m often asked the question, “What’s next for open source technology?” My typical response is variations of “I don’t know” to “the possibilities are endless.”
Over the past year, we’ve seen open source technology make strong inroads into the mainstream of enterprise technology. Who would have thought that my work on Hadoop ten years ago would impact so many industries – from manufacturing to telecom to finance. They have all taken hold of the powers of the open source ecosystem not only to improve the customer experience, become more innovative and grow the bottom line, but also to support work toward the greater good of society through genomic research, precision medicine and programs to stop human trafficking, as just a few examples.
Below I’ve listed five tips for folks who are curious about how to begin working with open source and what to expect from the ever-changing ecosystem.
1. Embrace the Constant Change and Evolution of Open Source
Constant change: this is the first lesson anyone who is new to open source technology needs to learn and one of open source’s biggest differentiators from traditional software. The nature of open source is fluid and flexible with new projects regularly being invented for specific use cases. This dynamic cycle propels products to get better faster. So, in order for companies to reap the full benefits of open source, they must be open to this change. The Spark vs. MapReduce debate is a perfect illustration of why this is important:
It’s true that folks are building fewer new applications based on MapReduce and instead are using Spark as their default data-processing engine. MapReduce is gradually being replaced as the underlying engine in tools like Hive and Pig, but that doesn’t make MapReduce obsolete. It will continue to work well for existing applications for many years, and, for certain large-scale batch loads, may remain the superior tool. This trend follows the natural evolution of open source technology: MapReduce was the 1.0 engine for the open-source data ecosystem, Spark is its 2.0 engine, and someday there will be a 3.0 that will make Spark the legacy engine.
2. When Introducing a New Technology Stack, Start Small and Go From the Top Down
Rather than architecting and deploying point solutions, we now have general-purpose data platforms with many tools that can be combined flexibly for search, streaming, machine learning and more. Together these aspects require not just a different set of skills but a cultural shift around management style and organizational structure. For this reason, it’s important to gain high-level support within an organization and introduce data management as an important boardroom-level discussion. I’d also recommend gradually building a new culture around a few new applications rather than replacing everything all at once to help everyone acclimate and starting with one specific use case.
3. Avoid Cloud Vendor Lock-in by Opting for Open-Source Software
As more enterprise organizations and industries embrace the cloud, they should consider open-source software that’s not only becoming more robust, scalable and secure, but which can also help them avoid cloud vendor lock-in. By building on an open-source platform, organizations can employ cloud-vendor arbitrage to keep costs down, use different clouds in different regions, or use a combination of cloud-based and on-premises systems. In fact, open-source platforms have also proven technically superior and will likely gain more ground in 2017. It's difficult for a single vendor to compete against a large number of institutions collaborating in open source. In addition, open-source data systems now lead in performance and flexibility, and they're improving more rapidly.
4. For Job Seekers, Focus on the Forest and not the Trees in the Open-Source Ecosystem
Job hunters in the fields of IT, programming and data science shouldn’t fixate on mastering individual technologies, but focus instead on understanding the best use of each of the components of the open source data ecosystem and how they can be connected to solve problems. This high-level architectural understanding is the most valuable skill to companies innovating in technology. Because as new technologies arrive, it’s crucial to understand how they fit in, what they might replace and what they might enable.
5. Seek Opportunity in the Skills Gap
The skills gap in big data will remain relatively constant in the next year, but this shouldn’t deter people from adopting Hadoop and other open-source technologies. As most of us know, when new technologies are created and vie for users, they are known by few. Only once a particular type of software is a mature standard part of the canon do we begin to have a substantial number of folks skilled in its use — but even then the skills gap can persist. It will disappear only when we stop seeing big improvements to the stack, which I doubt we want. In short, the skills gap is one of the primary factors gating the rate of platform change, but it’s also a sign innovation is at hand.
Conclusion
The open source ecosystem and its implementation in meaningful projects will continue to expand over the coming years. As an impetus for collaboration, it brings together today’s brightest minds to move software development forward at a pace not possible ten years ago. If you have an idea for improving existing technologies or want to rally behind a notion for breaking the status quo, this is the place. I encourage everyone interested to get involved and for those open source veterans to keep committing to the cause. Click here more information on joining the ASF community.
The author Doug Cutting is the founder of the Apache Lucene, Nutch, Hadoop and Avro open source projects. He was on the board of Apache Software fro six years, he is now the Chief Architect at Cloudera.