Most organizations, whether in the public or private sector, struggle to manage the data stored in the multitude of business systems they use. Some of the major challenges faced are the high cost of supporting such systems and keeping them live, the poor integration with modern cloud applications, and the inability to locate, organize and use productively the information scattered across them. Our solution to the problem is to migrate the data from those legacy systems into a standardized cloud-native product, make the data searchable and easily accessible, and allow the data to be exported via open APIs when necessary.

As a data engineer you will be analyzing client requirements and exploring client data to be migrated to Documaster’s own business-documentation management solution. You will be applying established Python 3.x and Apache Airflow data pipelines to the migration tasks for known cases, and developing custom metadata extraction and -generation automation to align new use cases to such data pipelines. Ultimately, you will be owning the entire data migration process and tooling for new and existing clients. Together with the broader team, an international product development team currently distributed across Sofia and Oslo, including a product manager, UX designers, developers and fellow data engineers, you will be helping define the most suitable data model serving the needs of new and existing clients alike. You will be advising developers on possible optimizations of data process flows and data APIs in the solution. You will need to independently analyze various file formats and extract metadata, investigate unseen data sets and come up with a strategy to extract necessary structured data from such in an automated fashion. Learning and having fun goes without question!

You will be a good match for the job if you have experience scripting in Python, Bash, or any relevant coding experience (academic, professional, or as a hobby), and we will be especially interested to talk to you if you have built data pipelines in Apache Airflow or similar open-source frameworks. You should be able to manipulate raw data using your favourite tools in your language or framework of choice, and be comfortable with converting and manipulating tabular data in various formats, such as CSV, MS Excel etc. both in a GUI environment (MS Excel, LibreOffice etc.) and programmatically, using e.g. pandas. You will feel well-equipped for the position if you have experience processing data in both structured (e.g. JSON, XML, YAML ...) and non-structured form and you are familiar with regular expressions; any kind of language processing experience will be considered a plus. Likewise, any degree of relational-database querying skill.

English goes without saying, along with the ability to communicate freely with stakeholders (Documaster employees and clients alike) to collect input and drive discussions around the most suitable data and metadata mappings for their specific use case and the ability to document your work in English accurately and in an orderly manner.

Good working knowledge of Linux and ability to write simple scripts would be considered a plus, but if you lack the experience, we will gladly help you out with that.

We love seeing motivated and satisfied people in the office every day. If you want to work in a team that constantly challenges you, and the job described here genuinely sparks your interest, you should get in touch with us right away. If you are the right match for our team we will surely offer you a competitive salary. On top of that, we have a stock options plan, a flexible benefits package tailored to your needs, and offer free training and certification programs. There are more goodies but we would like to keep them as pleasant surprises.

