Mission IT Support – HPC System Hardware Support Engineer Job
SAIC has a career opportunity for a Mission IT Support - HPC System Hardware Support Engineer located in Albuquerque, NM. The candidate chosen for this position will become a full time employee of SAIC with full benefits.
Clearance Level: Position requires an active Department of Energy (DOE) Q clearance or active DOD TS clearance.
The Senior High Performance Computing (HPC) system hardware support engineer contributes to support activities involving parallel computing architectures, technologies, and supporting infrastructures. The primary responsibilities of this position are:
- Provide technical leadership within the Capacity Computing and Visualization (CapViz) support team to perform day-to-day monitoring and maintenance across several production Linux HPC clusters with over 9,000 nodes and complex supporting interconnect, networking, and storage infrastructures.
- Provide expertise regarding hardware configuration, troubleshooting and repair of Dell, Hewlett Packard, Appro, DDN, LSI, Mellanox, Sun, and similar subsystems, including using the tools used in configuring, managing, and troubleshooting these systems, as well as Infiniband and Ethernet system interconnects.
- Maintain and develop operational plans and tools for the rapid recovery of systems after scheduled/unscheduled outages.
- Maintain and develop operational plans and tools for hardware subsystem regression tests; be able to monitor syslogs and other unix diagnostic information.
- Responsible for cable management, rack power and heat management, and equipment layout.
- Train CapViz team members on the procedures, policies, and tools used to monitor, repair and maintain hardware and firmware.
- Maintain thorough documentation of the policies and procedures used within the team as well as hardware configuration diagrams.
- Interface with vendors to keep up-to-date on firmware changes, modified hardware and diagnostic procedures and know bugs.
- Interface with vendors to manage warranty returns, which include keeping the spare parts cache inventoried and up-to-date.
- Keep track of and provide reports on system failure rates and warranty periods.
- Work closely with System Developers to improve, test, and implement break-fix processes.
- Assist in the configuration, testing, deployment, and integration of new high performance computing clusters.
- Assist in the development/operation of system decommissions and hardware reapplication plans.
- As needed, provide customer service support to engineering and scientific customers.
- Strong experience with Linux/Unix within large computing environments (e.g., HPC/Enterprise).
- 8 + years of years’ experience trouble-shooting, repairing, and supporting high-end computing systems and large storage systems.
- Proven ability to maintain hardware within production environments.
- Knowledgeable of requirements for working in a secure Vault Type Room environment
- Strong customer and communication skills.
- Ability to work in a team environment while requiring minimal supervision
- Self-starter with the ability to adapt to new challenges.
- Must be able to work cooperatively as a team member
- Desire to work in a fun, fast-paced, dynamic environment.
- Strong verbal and written communication skills.
- Perl and Shell scripting skills.
TYPICAL EDUCATION AND EXPERIENCE: Bachelor's degree in related technical discipline and 8+ years of operating systems experience. An additional 4 years experience in lieu of a degree may be considered.
SAIC is a FORTUNE 500® scientific, engineering, and technology applications company that uses its deep domain knowledge to solve problems of vital importance to the nation and the world, in national security, energy & environment, health and cybersecurity. The company's approximately 41,000 employees serve customers in the U.S. Department of Defense, the intelligence community, the U.S. Department of Homeland Security, other U.S. Government civil agencies and selected commercial markets. Headquartered in McLean, Va., SAIC had annual revenues of approximately $10.6 billion for its fiscal year ended January 31, 2012. For more information, visit www.saic.com. SAIC: From Science to Solutions®
Job Posting: Jun 20, 2013, 11:12:39 AM
Primary Location: United States-NM-ALBUQUERQUE
Clearance Level Must Currently Possess: DOE Q
Clearance Level Must Be Able to Obtain: DOE Q
Potential for Teleworking: No
Shift: Day Job
Other Job Listings for Leidos
- Sr. Maximo Developer Job
- Technical Editor Job
- Jr. Maximo Developer Job
- Maximo Developer Job
- Protection and Control Engineer--Houston TX Job
- Show all Leidos Jobs