Join our Talent Network
Skip to main content

Site Reliability Engineer

Location: , United States
Date Posted: May 20, 2024

Save Job Saved


Title: Site Reliability Engineer
Location: Remote
Terms: Full-time 
Clearance: Public Trust 
Travel: 0-20% 

That’s RIVA.  Our employee-first approach has manifested a culture that attracts the best and brightest.  By investing in people firsts, and providing a flexible work environment, our employees have higher moral, higher productivity rates, and lower turnover. At RIVA, people are our #1 priority.

Project Description:
RIVA is seeking to introduce Site Reliability Engineering principles and practices into our Product Team ecosystem in support of one of our Federal Programs. While teams have made much progress in the realm of Agile Delivery, the net maturity in terms of daily IT operations still has some room for improvement. We therefore seek assistance in Designing, Training, Communicating, Implementing and Monitoring SRE practices within this Federal Program with USPTO. Specifically, we require assistance in 4 main areas: Monitoring/Observability/Performance Engineering, Training, Reacting, and Improving.

Job Description:
We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will possess a deep understanding of complex distributed systems and have a proven track record in software engineering, infrastructure management, and system reliability both in the Cloud and on-premises. You will play a crucial role in bridging the gap between legacy development and operations teams, fostering a culture of collaboration and continuous improvement to maximize system uptime and performance.
Primary Responsibilities:
  • Infrastructure and Configuration Management:
  • Deploy and configure services using Infrastructure as a Service (IaaS) providers.
  • Configure and manage servers and application stacks to serve dynamic websites.
  • Operate and integrate enterprise system monitoring and logging tools.
  • Operate and integrate configuration management tools to support continuous delivery while maintaining configuration control.
  • System Reliability and Optimization:
  • Debug cluster-based computing architectures.
  • Use scripting tools to automate standard recurring processes.
  • Perform analysis to optimize system uptime and performance.
  • Design and implement load balancing tools.
  • Address all cybersecurity vulnerability findings and develop Plans of Action and Milestones (POA&Ms).
  • Technology Integration and Development:
  • Provide development guidance for the integration of new technologies into the application framework.
  • Design, develop, operate, maintain, and administer current technologies as specified.
  • Lead research and development (R&D) activities as required.
  • Resolve problems requiring intimate knowledge of related technologies.
  • Performance Monitoring and Automation:
  • Apply IT service management processes and techniques.
  • Develop and implement monitoring and instrumentation to provide teams with necessary data.
  • Apply a data-driven approach to the analysis of operational concerns, using mathematical and statistical modeling to develop solutions.
  • Security and Protocols:
  • Apply firewall and authentication technologies.
  • Learn, develop, and maintain a thorough understanding of design protocols and engineering standards.
  • Incident Management
  • Managing Postmortems
Required Qualifications:
  • Minimum 3+ years of experience in an SRE role.
  • Minimum 10+ years of software engineering experience with skills in Java, Python, Angular, Node, etc.
  • In-depth understanding of systems from a reliability engineering point of view.
  • Expert-level understanding of architecture and complex distributed systems.
  • Demonstrated ability to understand and resolve complex system issues including instability and poor performance.
  • Strong technical infrastructure understanding, including containers, operating system internals, networking, storage, network load balancing and routing, and the TCP/IP stack.
  • Experience with Terraform IaC and AWS cloud technologies to include Cloud Formation Templates.
  • Strong communication skills and an understanding of organizational/team dynamics and negotiation.
Preferred Qualifications:
  • Automation-first mindset with the skills to match.
  • Experience with GitLab CI-CD.
  • Experience with Terragrunt.
  • Continuous Everything, Shift Left Everything mentality.
  • Experience in developing and implementing monitoring and instrumentation.
  • Strong use of the scientific method to develop solutions.
  • A good working knowledge of mathematical and statistical modeling.
  • Linux
  • Windows
  • Experience with feature flagging.
  • Experience with advanced deployment models such as Blue/Green, Canary.


RIVA Solutions is an Equal Opportunity/Affirmative Action employer.  All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identify, national origin, disability, veteran status, or any protect class.  If you need a reasonable accommodation to search for a job opening or to submit an online application, please email [email protected].  Only messages left for this purpose will be returned.

Share: mail

Similar Jobs