Lab 1: Investigating Open Source Development

October 06, 2018

While most people today have at least heard of Open Source software, many people, myself included, don't know a lot about how the development process works or how these communities are structured and managed. Today's post will explore these topics in reference to two Open Source projects from different licences, Privacy Badger under the GNU General Public License and Deeplearning4j under the Apache2.0 License.

Privacy Badger:

Privacy Badger is a browser extension aimed at striking a balance between a user's right to privacy and content providers' business interest. By only blocking ads and cookies that ignore the Do Not Track setting of the user's browser, thus as long as a content provider isn't hosting those types of ads they won't lose out on ad revenues.

Privacy Badger is run primarily off of its GitHub repository (link below) and it seeks to make contribution easy by keeping a list of desired issues to tackle that are publicly available. They further simplify it for new contributors by keeping a list of good issues to start out with to help familiarize newcomers with their code while still being productive.

Once you choose an issue to take on Privacy Badger has two sets of guidelines to follow, one for developing and one for testing. The developer guide has two brief points. First is to Lint all of your changes according to their set of ESLint rules. The second is to write a good commit message to make reviewing the pull request easier. The testing guide outlines all the tests that will be run automatically for each pull request so you can run those tests while developing to make sure your code is up to their standard. They use QUnit for their unit tests which are shipped with the source code. For functional tests, you will need Python, Selenium and pytest and they can be run automatically through Travis CI.

This open approach to contribution, with clearly defined standards and tests to check your code against them, allows Privacy Badger to leverage its community without having to monitor development along the way. It does come with the drawback of a more involved review process to make sure the proposed solution is an efficient and correct one even if it is operable.

Deeplearning4j:

Deeplearning4j is a deep learning library for Java and JVM as well as a framework that supports many deep learning algorithms. Its framework is composable, allowing shallow neural nets to be stacked together to form varied deep nets.

Deeplearning4j (DL4J) is run off of GitHub (link below) but it's community interacts on Gitter (link below) for discussing development issues. It also has it's own website (link below) with tutorials on using DL4J, guides for development and other relevant community information. They have a few different ways for people contribute to their work. They have a well organized issue tracker on GitHub with tags for issues for newcomers to start with. They also have a roadmap for the project so there is always work that can be picked up. Lastly you can talk with developers on Gitter to find issues that need help.

When working on an issue, DL4J outlines a few practices to be followed and software to be used. In particular they use a specfic suite of software to ensure consistency is both form and function of new code. They use Maven as a dependency management and build tool, Git for version control, Project Lombok for code generation and annotation to reduce boilerplate code and VisualVM to profile code and identify performance issues and bottlenecks. In terms of practices they outline a set of standards for testing and documenting code. Firstly, all code has to be Java 7 compliant. Then, all code must have informative comments to help with code maintenance. When adding new functionality you must also add unit tests for it using JUnit including testing for edge cases. Lastly, when adding significant functionality also add documentation to the website and provide an example. Once your code is complete, you create a new pull request on their GitHub repo where it will be tested with the current set of unit tests to ensure it functions and cooperates with all other code. Then it will be reviewed by existing members and any changes will communicated through GitHub before it is approved and merged.

DL4J's approach to community involvement is similar to Privacy Badgers with the only two main differences being their standards on code documentation and their developer chat group on Gitter. With close to seven thousand participants in their Glitter community, it is essential that DL4J maintains clear commenting of code so that it can all be understood and maintained cohesively. Having such a large Gitter presence also helps developers cooperate and problem solve more efficiently than simply having a GitHub presence.

By building communities around a project and leveraging a wide pool of experience and knowledge, Open Source development offers an efficient process for innovating functionality, identifying and fixing issues and thoroughly testing code. As long as there is a clearly defined set of standard practices and a careful review process, it can be an ideal way to develop projects.

Links:

Privacy Badger GitHub: https://github.com/EFForg/privacybadger

DL4J GitHub: https://github.com/deeplearning4j/deeplearning4j

DL4J Website: https://deeplearning4j.org/

DL4J Gitter: https://gitter.im/deeplearning4j/deeplearning4j/

Search This Blog

Colin McManus SPO600

Lab 1: Investigating Open Source Development

Comments

Post a Comment

Popular posts from this blog

Project Stage 3 - Optimization

Project Stage 2: Benchmarks

Lab 5: Algorithm Selection