GitGuardian is a tech company, so engineering sits at the heart of everything we do. The department is working on solving challenging problems:
Scanning various data streams at scale to find secrets in them (scanning >10M code patches, messages or images daily)
Developing components that are deployed on our customers’ infrastructure to securely collect and map non-human identities
Training and deploying models and algorithms to surface, aggregate and contextualize rich metadata around each secret, then integrating those insights into the product without compromising user experience.
You’ll join our Machine Learning squad—a team of four engineers within our 50+-strong engineering department—working together to build and ship ML features for our products.
Today, our priority is helping SecOps who are using GitGuardian to prioritize and navigate incidents. Some incidents, if abused, can cause hundreds of millions of dollars in damage.
We deeply believe machine learning is essential to building an effective prioritization algorithm, and that this algorithm must leverage all available context—from information in the patch and repository to company-level and asset-level data. This is why we work closely with both the Secret Detection team, in charge of our secret detection engine, and the Incidents team, who owns the interface and incidents management in the app.
Your daily responsibilities will be to:
Write code daily to make our platform smarter, faster and more reliable.
Train, evaluate and iterate on models using our large multi-modal dataset
Drive end-to-end ML/AI projects from scoping and prototyping through deployment and monitoring
Level up our MLOps deployment for larger models at the scale we have and with the additional complexity of self hosted compatibility.
Bring expertise and best practices: define conventions, review code, and mentor junior engineers.
Contribute to the continuous improvement of our existing deployment pipelines, optimizing inference speed and any other ideas to improve our day to day and reliability.
Technical environment
Languages & frameworks: Python, PyTorch/Transformers, ONNX Runtime, BentoML, scikit-learn, LiteLLM
Data & orchestration: DVC, SkyPilot, Snowflake, Dagster
Main Application: Celery, Django, PostgreSQL, Redis
Infrastructure & Deployment: AWS, Kubernetes, ArgoCD, Gitlab
Collaboration: Slack, Linear, Notion
More details on our current stack here!
What makes this position unique?
GitGuardian is a tech oriented company with a mission: making the world safer for developers. Thanks to very talented engineers, we are selling a strong product to top level companies that have a high level of expectations. As a data driven company from day one, GitGuardian has more than 40B code patches in our DBs and we’ve been running our models at scale on a huge volume of data for years now!