- Location: Addison, Texas
- Type: Contract
- Job #26825
WHAT EXACTLY WILL THIS INDIVIDUAL BE WORKING ON?
As a member of the Site Reliability and Platform Engineering Team this individual will act as a subject matter expert in the discipline of Site Reliability Engineering.
They will work closely with the DevOps Product Owner to define and establish a roadmap of activities to mature the capabilities of the organizations Site Reliability Engineering Practice.
They will work with application, infrastructure, security team to establish a robust monitoring and notification scheme that ensures visibility awareness for IT staff into health and availability of the business critical applications.
They will perform hands on design, development, testing, documentation and deployment of various monitors, automation, reporting, and dashboards.
They will collaborate closely with the Enterprise Monitoring and Service Now Group to evolve and develop the necessary monitoring, alerting, auto remediation capabilities to support the needs of the Site Reliability Engineering Discipline.
They will also provide training and mentoring to other team members to develop the skills and competencies of our regional COE teams.
This role will be expected to help triage major incidents, assist in root-cause analysis and use that information to help drive remediation activities to increase system stability reliability.
In addition to the Site Reliability discipline, this individual will also help manage, maintain and support Mulesoft API Integration platform.
Candidates to this position should be willing to participate in on-call responsibilities and provide support outside of normal work hours as needed. They should also be willing to have their personal mobile phone be enrolled to receive notification from system alerts via sms and company email.
SPECIFIC SKILL SETS AND EXPERIENCE REQUIREMENTS:
Requisite Technical Skills Requirement:
• Demonstrate ability to be given a high level objective, understand the “why” behind the objective, take ownership of the objective, break down the high level objective into execution tasks, identify dependencies, clarify and confirm critical aspect of the plan with supervisor, set reasonable commitment dates for delivery.
• Ability to take a new technology platform/system, self-learn/train through generally available existing documentations or vendor support resources, setup POC of the technology, assess and evaluate the critical features and functionality of the technology for fit of use and purpose.
• Ability to provide practical, maintainable technical solutions that scales.
Site Reliability Engineering Experience:
• Broad experience with the design and implementation of application and infrastructure monitoring through APMs, web synthetic monitors, ticketing, notification, reporting and dashboarding platforms.
• Excellent troubleshooting skills and knowledge of the infrastructure, middleware, and application layers.
• Very comfortable with using diagnostic tools such as Fiddler, Chrome Dev Tools, SPLUNK, Dynatrace Synethics, etc… – NOTE that a strong background in SPLUNK is required.
• Strong experience with supporting complex service oriented large-scale web based transactional systems and the common integration patterns utilized and associated protocols and interfaces such as REST, SOAP, Message Queues, Custom Services etc…
• Solid foundation in various security aspects of authentication and authorization schemes – SSL Certificates/Cookies/Integrated Auth/Basic Auth/Oauth, SAML
• Hands-on experience in application load testing, analysis of the load testing data and providing recommendation on capacity management, availability and performance.
• Deep experience in understanding and troubleshooting network layers including DNS, CDN, Firewalls, VPNs, MPLS, Proxies/Reverse Proxies, Load Balancers,
• Demonstrate ability to design, develop, test, and deploy automations related to maintaining and improving the health, stability, resiliency, and security of the application services and web sites.
• Experience managing automation code through repos such as GIT, Bitbucket, TFS etc…
• Experience defining relevant KPIs and producing reports and dashboards which provides the necessary insight on the health, stability, resiliency, and security of the application services and web sites and SDLC activities.
• In addition to the Site Reliability Discipline we would also want this individual to help augment the engineering and management of our Mulesoft API platform. They do not have to have prior experience with Mulesoft but any experience with similar API gateway technologies would be helpful and they will be expected to quickly come up to speed on the technology and implementation of the Mulesoft platform.
• Exceptional interpersonal and communication skills
• Ability to participate in 24/7 escalation on-call rotation and respond to mission-critical issues as needed
• They should be willing and able to attend early morning or late night meetings as required when interfacing with our Regional IT teams in Europe, LATAM and APAC.
• Passion and drive to improve efficiencies in how we deliver IT services
SPECIFIC SKILL SETS PREFERRED/DESIRED, BUT NOT NECESSARILY REQUIRED:
• Mulesoft or other API Gateway platforms
• AzureDevOps or other CI/CD platforms
• Salesforce Service Cloud, Salesforce Marketing Cloud, Salesforce Community Cloud, Salesforce Commerce Cloud
• Heroku and AWS
• Mulesoft, CA API Gateways
• Let’s Encrypt
WHO WILL BE INVOLVED IN THE INTERVIEWING PROCESS? HOW LONG WILL EACH INTERVIEWER NEED WHEN INTERVIEWING CANDIDATES?
• Hiring Manager video conference – 60 minutes
• Technical Panel Interview onsite or via video conference – 2 hour
Workspace – TBD
Contractor is required to bring own mobile phone as tools and equipment