New
Site Reliability Engineer - CTJ - Secret
![]() | |
![]() United States, Virginia, Reston | |
![]() | |
OverviewDo you have a passion for high scale services and working with some of Microsoft's most critical customers? We're looking for a Site Reliability Engineer with the right mix of software development, on-line services experience and passion for quality to envision, design, and deliver Office 365 government cloud service offerings. Office 365 is at the center of Microsoft's cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft's largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance. The Site Reliability Engineering (SRE) team provides leadership, direction, and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users. We're building a new team focused on 24x7 coverage in US Government clouds and this position will require off-normal business hours work. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesYou will develop a foundational understanding of distributed systems design, interactions between cloud technology layers and components, basic dependencies at scale and the code that defines infrastructures.You will develop an understanding of the code, features, and operations of specific products at scale as required to contribute to incremental improvements in product availability, reliability, efficiency, observability and/or performance.You will help test basic changes to optimize code and improve the observability, reliability and operability of a defined range of platforms, system or product components or features with direction from other engineers.You will support ongoing engagements with product engineering teams by participating in code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles,You will develop an understanding of how to safely and reliably manage changes in production by using existing tools and automation to enable product engineering teams to implement changes across a defined range of components or features, with direction from other engineers.You will use existing tools to troubleshoot problems or flaws affecting the availability, reliability, performance and/or efficiency of components or features with guidance from other engineers. |