Roles:
Data
Must-have skills:
Python
Considering candidates from:
Argentina, Austria, Canada, Croatia, Czech Republic, Hungary, Poland, Romania, Serbia, Slovakia, Slovenia, United Kingdom and United States
Argentina, Austria, Canada, Croatia, Czech Republic, Hungary, Poland, Romania, Serbia, Slovakia, Slovenia, United Kingdom and United States
Work arrangement: Remote
Industry: Software Development
Language: English
Level: Middle or senior
Required experience: 2+ years
Size: 51 - 200 employees
Company
Sorcero provides a smart enterprise platform that builds a deep language intelligence operating system for technical domains, including Insurance, Financial Services, and Life Sciences. Their platform, built by the former leadership of the MIT Media Lab, harnesses AI and Natural Language Understanding to deliver new capabilities to augment human performance. Sorcero's NLU platform is a pre-built “no-code” drag and drop solution to reduce the deployment time of applications from months to days.
Description
Now Sorcero is looking for a Data Engineer experienced with orchestration of data and ML pipeline workflows and writing batch and stream data processing pipelines for: ingestion, ETL, and analytics.
Their app stack: HTML/CSS, Vue.js, Python, Machine Learning, data science, MLOps, Elasticsearch, Redis, various Graph databases, Cloud Storage, Cloud Composer, Cloud DataFlow, TensorFlow Extended, Vertex.
Tasks:
Their app stack: HTML/CSS, Vue.js, Python, Machine Learning, data science, MLOps, Elasticsearch, Redis, various Graph databases, Cloud Storage, Cloud Composer, Cloud DataFlow, TensorFlow Extended, Vertex.
Tasks:
- Collaborate with the VP Eng. on the overall data strategy for the B2B SaaS applications and API functionality delivery, breaking down tasks into manageable and achievable deliverables
- Work cross-functionally with AI scientists, Backend Engineers, DevOps and Product Managers to design and implement new data models for the product line
- Expand the current data-processing framework to include all active projects across multiple environments
- Process unstructured data into a form suitable for analysis
- Mentor other data engineers on best practices
- Understand the products from a customer perspective and the software from an engineering perspective
- Own assignments from proof-of-concept to design, architecture, code delivery, and deployment
Must-have:
- Strong programming skills in Python
- Experience working with Google Cloud Composer (Airflow)
- Experience with Google Cloud DataFlow or Apache Beam and its runners (knowing how things work under the hood is a plus)
- Experience with Elasticsearch and PostGres and an interest in learning Graph databases (Neo4j and DGraph)
- Experience with DataLakes, Google Cloud Storage
Nice-to-have:
- Knowledge of other programming languages
- Knowledge of other workflow orchestration tools
- Knowledge of S3, Delta lake
- Familiarity with dashboard technologies (Looker, Ploty, Tableau, Grafana, ELK, etc.)
- Familiarity with scaling (Redis, SQL, Elasticsearch, PostGres, Graph Dbs, Kafka, etc.)
- Knowledge of data warehousing and ETL concepts
- Familiarity with AWS equivalent tech stack: Glue, EMR, Data Pipelines
- Data security, sub-second latency, Machine Learning and NLP experience
Benefits:
- Fully-remote position
- Additional vacation week between Christmas and New Year
Interview process:
- Intro call with Toughbyte
- 1-2 hour Technical interview with VP of Engineering
- 1h call with the team