Job Description: The IT support world is evolving and it's clear that SREs will play a pivotal role in the future. The resulting demand means rapid career progression and fabulous benefit packages. We are offering you the opportunity to grow your SRE skills and experience.
You will:
Receive training in areas such as black-box troubleshooting and Golden Signal Monitoring
Work alongside experienced SREs who support and mentor new team members
Undertake a mix of project and BAU work to increase your level of experience
Being typical SREs, we want to automate everything and so we will help you develop those skills too.
You will come from one of two backgrounds:
Software Developer - you have been a developer for several years; you can do the job standing on your head and you are now ready for a new challenge
Application Analyst - you have a working knowledge of distributed systems, and you are hungry to develop your expertise
Crucial characteristics are that you love technology and always want to learn more.
Team Overview: - The SRE Team provides services to our application and infrastructure support colleagues, namely:
Recurring Problem Diagnosis - where we determine the root cause of recurring problems where the causing technology is not obvious
Golden Signal Monitoring - where we monitor the availability of application and infrastructure services, notifying and helping support teams deal with the issues we identify
Automation - where we provide automation consultancy and build help to infrastructure teams as well as frameworks, common code and automation hosting services.
Community - where we promote and assist in the adoption of SRE concepts and techniques throughout the bank.
The remit for the SRE joining us will be to help deliver the Recurring Problem Diagnosis and Golden Signal Monitoring services.
We use advanced techniques. This is an exciting opportunity for the right candidate to learn those techniques and hone their problem diagnosis and Golden Signal Monitoring skills.
This Role - Recurring Problem Diagnosis (RPD):
You will help our team diagnose ongoing recurring problems, basing the investigation on a structured problem diagnosis method called RPR.
We will teach you the RPR method and then support you through the investigations.
We rarely get involved in ongoing incidents; we specialize in diagnosing recurring problems.
Occasionally, (and this would only be three or four times each year) we are asked to attend a SWAT call (ongoing significant incident).
Sometimes, this involves work outside of normal business hours.
Golden Signal Monitoring (GSM)
Where the RPD work can be thought of as being reactive (reacting to problems that arise), the Golden Signal Monitoring service provides a proactive approach to problem detection and diagnosis.
Our GSM objectives is clear:
Identify and resolve issues before they become service impacting
Reduce errors and transient response time issues to drive up service levels
We monitor Golden Signals using a system called Site Reliability Core.
You will use SRC to identify app and infra service issues, perform a preliminary investigation and then raise the matter with the service owner.
We will teach you how to use Site Reliability Core and how to deal with matters arising.
Personal Qualities - To undertake this role, you will need:
Keenness to learn new technologies, concepts and techniques.
Demonstratable critical thinking skills - we deal with complex issues, and this requires clear thought processes.
Synthesize an approach to an issue from existing knowledge and the new techniques we teach you.
Drive and determination - the issues we deal with often take twists and turns demanding real stamina from our SREs.
Confidence to stand your ground using data to explain your conclusions and recommendations.
Qualifications - The following qualifications are essential for this role:
At least 5 years' experience in either;
Java EE / Jakarta EE application software development, or
Java EE / Jakarta EE application support
A demonstratable understanding of distributed systems.
A working knowledge of containerized applications.
A demonstratable basic understanding of TCP/IP.
A demonstrable understanding of an application layer protocol such as HTTP.
Nice to Have - The following qualifications would be beneficial to this role:
Experience developing or supporting applications based on Tomcat application servers.
Experience developing or supporting applications based on WebLogic application servers.
Experience developing or providing support in a microservice environment.
Knowledge of a messaging technology such as MQ (Message Queue), Solace or Kafka.
Experience in full stack support (application, data and infrastructure).
Knowledge of Oracle or Microsoft SQL Server relational database technologies.
Experience in analyzing data logs using Elastic Kibana.
Experience in analyzing data logs using Azure Log Analytics.
Experience in the use of Wireshark for the capture and analysis of network packet traces.
Experience (past or present) in the use of an automation platform such as Ansible, Puppet, Chef, Salt or vRA.
Experience developing or supporting applications based on Pivotal Cloud Foundry (Tanzu Application Service).
Knowledge of SRE concepts and techniques.
Experience with DevOps-related tasks; in particular, BAU support.
Experience in using ServiceNow.
An understanding of the regulatory landscape for financial services.
Tasks & Responsibilities - General:
Attend weekly team meetings.
Submit time records at the end of each week.
Undertake general tasks that may be allocated from time-to-time.
Recurring Problem Diagnosis (RPD)
The investigations will be based on our RPR problem diagnosis method which we will teach you.
The tasks and responsibilities are:
Conduct Discovery Calls to obtain a problem statement, a high-level understanding of the moving parts of the system to investigate, how the data flows around the system, and the diagnostic data sources available.
Produce a Diagnostic Capture Plan that describes how the data needed will be captured.
Help app and infra people to execute the Diagnostic Capture Plan.
Analyze the data that results to determine the root cause of the problem, or the next steps.
Issue periodic email-based status reports.
Attend investigation progress meetings with stakeholders.
Notify the team leader of blockers or other issues that may arise.
Assist other SREs in investigations.
Handle multiple RPDs at any time - this is possible as there can be long pauses in the investigations.
Undertake projects to improve our ability to solve problems.
Golden Signal Monitoring:
Use Site Reliability Core (SRC) to identify app and infrastructure services that are missing their Service Availability target or in danger of doing so.
For the services identified as having a problem, investigate using SRC and other data sources.
Assess the underlying issue against criteria that we have establish and, where appropriate, create a ServiceNow problem record with details of the problem and assign to the service owner.
Work with the team that owns the service to help them understand our findings and explain to vendors.
Assist the service support team in determining the cause of the problem.
Assist in the operation of our SRC system including onboarding of services, setting availability metrics and fine tuning SLOs.
Undertake projects to improve our ability to monitor systems and deliver service availability information.
About Northern Trust:
Northern Trust provides innovative financial services and guidance to corporations, institutions and affluent families and individuals globally. With over 130 years of financial experience and nearly 20,000 partners, we serve the world's most sophisticated clients using leading technology and exceptional service.
Working with Us:
As a Northern Trust partner, you will be part of a flexible and collaborative work culture, which has a strong history of financial strength and stability. Movement within the organization is encouraged, senior leaders are accessible, and you can take pride in working for a company that is committed to strengthening the communities we serve!
We recognize the value of inclusion and diversity in culture, in thought, and in experience, which is why we are honored to receive the following awards in 2021:
Gender Equality Index Member, Bloomberg
Top Financial & Banking Company, Black EOE Journal, Hispanic Network Magazine, Professional WOMAN'S Magazine
We'd love to learn more about how your interests and experience could be a fit with one of America's best banks and most sustainable companies! Build your career with us and apply today.
Additional Information