Data and DevOps Engineer, Architecture
Tenstorrent
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
Tenstorrent designs and optimizes high-performance computer systems using open standards and open-source software. Our Platform Architecture team leverages simulations, data analysis, and data-driven decision-making to prioritize features for our product roadmap.
We’re seeking a hands-on problem solver to manage data pipelines associated with performance metrics, to help monitor the Slurm cluster where simulations run to generate those metrics, and to improve workflows that support hardware architecture refinement. You'll work closely with experts across multiple disciplines to enable data-driven insights for optimizing future-generation CPUs and AI systems.
Join us to enable smarter workflows and optimize innovative hardware products shaping the future of computing.
This role is remote, based out of The United States.
We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.
Responsibilities:
- Design dashboards and automate performance tracking for software workloads on simulations of multiple generations of future RISC-V CPUs and heterogeneous chiplet-based systems.
- Maintain or migrate custom data analysis workflows to scalable platforms like relational databases and Apache Superset.
- Efficiently manage, store, and enable the team to utilize large datasets.
- Take ownership of shared tools and workflows by engaging continuously with architecture and design projects as they move through the product development lifecycle.
- Collaborate with engineers and architects who are seeking to evaluate performance tradeoffs in our product design by obtaining data-driven insight.
- Streamline resource allocation and scheduling for CI/regression tests running on a Slurm cluster, ensuring optimal utilization of compute, storage, and network resources.
- Identify resource needs and participate in purchasing decisions or introducing new software tools to our analysis workflows.
- Implement improvements to our methodologies to define and launch simulation experiments and tests.
- Develop solutions to detect, categorize, and report workflow and job failures in Slurm, CI systems, and data pipelines
Experience & Qualifications:
- Master’s degree in data engineering or another similar field.
- 5+ years of experience in developing data pipelines and automation.
- Proven track record of creating dashboards or interactive data visualizations, with preference for making it easy to share, extend, or modify the visualization for further analysis by using Apache Superset or similar Business Intelligence tools.
- Proficiency in Python.
- Experience using and configuring Linux systems.
- Experience writing SQL queries (e.g., to query Postgres or SQLite).
- Basic familiarity with git version control.
- Some experience building C/C++ tools from source.
- Experience with automated pipelines that integrate with local or cloud-based services (e.g., with REST APIs), such as CI/CD or test automation systems.
- Experience in Linux cluster administration or working with batch schedulers (e.g., Slurm).
- Experience running and gathering data from any simulation or modeling tool.
- Knowledge of computer hardware design or CPU performance benchmarks (e.g., SPEC INT, Geekbench).
Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.
Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.
Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set by the U.S. government.
Our engineering positions and certain engineering support positions require access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process.
If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government. If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.