Responsibilities As part of PepsiCos digital transformation agenda, we are seeking an IT Operations Center Engineer to join our expanding SRE team. As POC Engineer, youll work with a variety of talented PepsiCo SRE teammates and serve as a driving force for IT stability, providing intelligent proactive monitoring to ensure end-to-end insight of mission critical services for all PepsiCo Technology Platforms. IT Engineer role to be part of PepsiCo Operations Center (POC) in Plano, Texas. This is a key technical role to drive observability, automation across the Technology stack to prevent incident and reduce MTTR. This role will participate in Incident and Problem management to provide technical guidance and direction to Managed Service Providers (MSP) in monitoring and automation. This lead role will be leveraging AIOps capabilities to help PepsiCo IT Operations to move towards a forward looking and proactive organization to bring minimum disruption to the business. As an POC Engineer, you will be partnering with Observability engineers to help create highly available observability solutions, transforming and accelerating PepsiCos AIOPS Journey. We're looking for someone with solid experience using observability data to debug systems, to reduce the frequency and length of production incidents, and to provide a cohesive overall view of systems health. The POC engineer will partner with ITSM teams, multiple Managed service providers and Operations SMEs across applications, Infrastructure to drive prevention agenda thru monitoring, automation and process innovations. Accountabilities: Perform the role of a technical lead specialist in PepsiCos Operation Center and drive towards overall IT stability. Drive the success of an AIOps based Command Center through anomaly detection, proactive identification of incidents, providing technical insights during major incidents, supporting the global major incident process, and providing senior management updates Work with Operations leads and SMEs to provide solutions following best practices for monitoring and automation to ensure maximum business benefits. Work with DevSecOps teams to provide reliability for services to minimize business impact. Work directly with Major Incident Managers and MSPs to identify root cause for high and critical incidents and opportunities for automation Provide technical and process guidance to team members and MSPs. Identify opportunities to improve operational stability and performance. Partner with Problem management team and MIMs to identify opportunities with monitoring and operations. Focus on improving the Mean Time to Resolve (MTTR) for major incidents and improve Mean Time to Detect (MTTD). Qualifications Bachelors degree or higher in computer science, engineering or related field or equivalent experience; masters degree preferred Experience: 12+ years of experience in IT Support & Operations role 5+ years of experience in an IT Command Center Experience implementing and delivering monitoring & Automation solutions in a complex hybrid environment. Previous experience defining, creating, and supporting monitoring dashboards Possess practical knowledge and appreciation of various aspects of distributed service design, including messaging protocols, caching strategies and autonomous software design practices. Experience with monitoring and observability solutions and methodologies, including server and network performance, hardware, web synthetics, and application performance monitoring a plus. Experience with observability, AIOps tools and methodology of products such as; Splunk, ElasticSearch, AppDynamics, Dynatrace, Solarwinds, Nagios, Graphite, Grafana, Prometheus Solution Mgr., Focus Run, Datadog, BigPanda, MoogSoft, ITOM etc. Experience with implementation, operations, maintenance of IT systems and/or administration of software functions in multi-platform and multi-system environments. Experience working on ITSM tools such as ServiceNow Technically savvy with very good hands-on knowledge of Network, Compute, Cloud, Application architecture Experience in shell scripting and high-level programming languages is a plus. Solid understanding of performance metrics, KPIs, statistical calculations, machine learning, and correlation. Ability to solve problems across the entire stack - operating systems (Linux/unix/windows), software, application, and network. Experience with a variety of modern distributed software tools, e. g. service discovery, containerization, messaging. Have a passion for data driven decision making using our tools, as well as automating anything and everything. Soft skills Experience in working with multiple Managed Service Providers Analytical and problem-solving skills Experience with Agile/Scrum methodology Understanding of ITSM process, with a focus on Event, Major Incident, Problem Management. Ability to handle complex situations through creative innovation. Excellent communication skills with the ability to provide reports for Sr. Management Broad technical knowledge in one or more of the following: OS and Platform Azure, PCF, Kubernetes, Linux, Windows, VMware, AWS, Cisco, Infoblox, F5, Palo Alto. AIOPS: Moogsoft, Big Panda, UiPath, ServiceNow Integrator HUB, Looms, Robotic Processing, Artificial Intelligence (AI) and Machine Learning (ML) Frameworks Automation/self-heal capability enablement & Observability and AIOPS: DataDog, Grafana, Prometheus, ELK, Elastic, Kibana, Kafka, CloudWatch, Jaeger, Zipkin, Kinesis, Apache Airflow, Focus Run, TWS, SAP Solman, Nimsoft, ELK, AppDynamics, Splunk API development using 3rd party libraries, REST/API Development expertise in Java, Python, PowerShell and shell scripting Experience with Agile/Scrum methodology Leadership Skills: Servant leader, comfortable interacting in a consistently collaborative manner with all levels of the organization Practitioner, empowering teams for success, with belief in sharing successes and failures, on a global basis Ability to speak both IT and business, with a focus on representing the art of the possible Capable of remaining composed and working well under pressure Experience working with multiple Managed Service Providers Excellent written and verbal communication with strong interpersonal skills Note : COVID-19 vaccination is a condition of employment for this role. Please note that all such company vaccine requirements provide the opportunity to request an approved accommodation or exemption under applicable law.","location_name":"TX-Plano-7701a-HDQ-TX224B","street_address":"7701 Legacy Dr","city":"Plano","state":"Texas