Sr. DevOps Lead Engineer / Architect
San Carlos, CA
Reporting to the Director of Cloud Operations, this position will be responsible for systems & application service uptime in a high-availability customer-facing business critical 24×7 SaaS environment where uptime is critical and requires immediate response to service impacting issues.
Bluescape helps companies create better. Its visual collaboration software gives teams a virtual workspace to meet, share, and develop ideas. Founded in 2012, Bluescape is a wholly owned subsidiary of Haworth.
Our company culture represents an intermix between passion for technology and rock-star output and an appreciation for a balanced and healthy lifestyle.
We are located right in the heart of the peninsula, minutes away from the freeway and restaurants. Our office is beautiful, snacks are plentiful, and innovation is in the air. We’re all entrepreneurs here no matter what your role is, so if you want a hands-on, fast-paced environment to learn and thrive in, we’d love to hear from you.
About the Role
Reporting to the Director of Cloud Operations, this position will be responsible for systems & application service uptime in a high-availability customer-facing business critical 24x7 SaaS environment where uptime is critical and requires immediate response to service impacting issues. You must have a strong command over architectural design and tradeoffs in installation, configuration, and diagnostics with extensive hands-on expertise in open source Linux systems in a large scale DevOps environment. The right candidate will have excellent verbal and written communication skills with demonstrated ability to work across departments towards a common goal. Passion for implementing open source tools, systems/network/application diagnostics frameworks, CI/CD environments for a SaaS enterprise with a structured approach to achieve high-quality sustainable production operations. Candidate will have knowledge of deployment of Java and/or Node.js and/or other typical enterprise application frameworks and languages.
- Develop and manage consistent and coherent DevOps processes and practices to support software development, testing, builds, and deployment.
- Guide and develop infrastructure & tools architecture design to enable high uptime, minimize failures, ensure applications & data security and expedite diagnostics.
- Identify, diagnose, and resolve complex technical issues efficiently in a live production environment and drive to quick resolutions – as well as – leverage those events to improve current technology & processes towards prevention of such issues.
- Work closely with the Engineering teams to escalate and/or triage issues to resolution.
- Review tickets and diagnostics with a post-mortem to identify trends/chronic issues.
- Hands-on implementation & upgrade of tools for monitoring, trending & diagnostics.
- Audit proactive monitoring of all systems to detect and resolve problems to ensure uninterrupted operation of all infrastructure systems.
- Update corresponding documentation on installation process & configurations.
- Automate, Automate, Automate everything.
Skills and Qualifications
- Four-year technical degree or equivalent experience with Minimum of 5+ years of combined experience working in a modern SaaS DevOps or related environment.
- Solid knowledge of architecture concepts and practices
- Knowledge of architectural design patterns, e.g. immutable production, fail fast, stateless etc.
- Experience in architecture, implementation planning & project management of Infrastructure for SaaS applications in a large scale environment.
- Strong understanding of Application release management & configuration, upgrades/patches & support of Unix/linux systems – applications on Node.js or similar in a SaaS environment.
- Passion for troubleshooting and triage of incidents, bringing issues to rapid resolution.
- Ability to apply detailed knowledge of organizational procedures to make independent decisions and serve as a credible resource for technology teams.
- Strong verbal and written communication skills, with the ability to work effectively across organizations
- Excellent problem-solving skills with the ability to analyze situations, identify existing or potential problems and recommend solutions
- Strong software engineering skills and computer science knowledge
- Excellent understanding of scalable, micro-service based architectures and experience in applying them to real-world problems
Extensive working knowledge of as many of the following technologies and areas as possible:
- Systems – Linux, Unix, Docker, OpenShift & open source software
- Command over popular scripting languages to enable automation of release processes, monitoring, trending, alerting techniques – ideally a working knowledge of Python & Shell.
- Automation using Ansible in a cloud environment
- Working knowledge of databases
- Good Networking fundamentals with Protocols, Load Balancers, VPN, switches/routers/firewalls, LDAP, SNMP, SMTP
- Good understanding of filesystem Technologies – to build and/or troubleshoot filesystem issues
- Virtualization/Cloud technologies – Strong working knowledge of AWS with a good understanding of other technologies like OpenStack, OpenShift, Google Cloud
- Web servers/reverse proxies such as apache, nginx and haproxy
- Web application frameworks in node.js, python, etc.
- Monitoring, trending & diagnostics tools
- Logging tools such as Splunk, ELK stack, etc.
- Using source code control systems such as git (or similar)
- Work/defect tracking & Wiki systems such as JIRA / Confluence
- Knowledge of the use and maintenance of continuous integration and continuous deployment systems.
Ability to prioritize & balance activity between projects for longer-term impact –and- immediate production critical requirements with a customer focus.
- Must be a self-starter and require minimal guidance.
- Excellent verbal and written communication skills essential.
- Ability to work in a collaborative environment essential.
- Ability to take on-call escalation rotation & co-ordinate work under production critical situations is essential.
- Competitive salaries
- Comprehensive health insurance available (medical, dental, and vision) for you and your family
- Life and AD&D coverage
- Long Term Disability coverage
- Paid vacation, sick time, and company holidays plus a 401K
- Work in a new, bright, open-environment and collaborative office with plenty of snacks, parking, and high energy
- Choice of MacBook Air or Windows laptop