We are a medical device company developing solutions for the diagnosis and treatment of lung cancer with core technology in computer vision and augmented reality (AR). Most of our value proposition comes from software and algorithm development. In facing algorithmic challenges in our work, we do a lot of research activities and iterate quickly to convert what we learn into working features.
In addition to being able to implement an idea fast, we look for methodologies and tools that allow us to minimize the time between having a working prototype and having a production ready feature in the product.
As a part of our philosophy all team members have a solid background in software engineering so they can write production quality code from the beginning.
Choosing the programming language
Selecting the right programming language is important as porting product and infrastructure code after investing months of development can waste time. The programming language should fit the company and its mission while supporting the development culture and methodology of the R&D team.
The ideal programming language should:
Be a general purpose language:
Support solid modern programming concepts
Allow simplicity in general tasks like multi-threading, networking,
Allow fast prototyping
Have a large, active community
Have good tooling (IDE, build tools, CI)
Enable algorithm development in the field of Machine Learning/Computer Vision/Image processing and numerical computations in general
Create graphic user interfaces
Support both 2D and 3D visualization
Be interoperable with other languages
As the pace of development is a key parameter, we have ranked the languages according to level, degree of difficulty and chance of bugs:
Python and Go are comparable in terms of development speed, especially for web development. However, as we are focusing on data science programming, Python remains in first place.
Advantages of Python:
Python is simple, productive and readable so it is more maintainable
Python is a popular, mature and modern OOP language with a huge active community
Python and its packages cover all software fields, allowing easy integration of numerous libraries in one code base. Many tasks related to WEB, networking, cryptography, async programming, scientific programming and machine learning are already implemented so it’s easy to become productive with the standard library alone
As Python is the most common language in the research community, academics publish source code associated with research papers. This provides easy access to cutting edge findings in the Computer Vision and Deep Learning research communities
Limitations of Python:
Performance issues due to:
Being an interpreted language
Global Interpreter Lock (GIL), which prevents multi-threading
Bugs due to the dynamic nature of the language (duck typing)
Weak packaging system
PIP and CONDA help with package management, but both have issues involving dependencies
Options that address these limitations:
For CPU-intensive tasks, multiprocessing can be used to avoid the limitation of GIL. There are external native libraries available for most computational tasks (like Numpy), so heavy number crunching can be avoided in Python. If there is no native library for the task, then it’s possible to implement critical parts in C++.
For IO bound tasks, Python’s multithreading has been adequate since the GIL release.
To address the duck typing issue, mature static analysis tools are available and from Python v3.5 there are built-in type hints in the code that are utilized by modern IDEs to find errors
The use of containers (like Docker and virtual environments) isolate components and therefore reduce the chance of package dependency conflicts
Today Python is used as the default language by many medical imaging companies while C++ is used as a fallback when performance improvement is needed.
The following list of libraries is a good start:
OpenCV - for image processing and some geometry related algorithms
ITK - for image segmentations
VTK, Qt3d - for 3D rendering
Qt with PyQt (QtQuick, QWidgets) - for application UI development
Numpy - for all matrix and vector operations
SciPy - general purpose algorithms and optimization
Pandas - structured data manipulation and reporting
PyDicom - for loading DICOM images
Numba - JIT compiler that translates a Python code into fast machine code
Nose - for automatic testing
Tensorflow / Theano - backend engine for deep learning
Keras - research and production of deep learning algorithms
This software stack may be applicable to many companies beyond the field of medical imaging.