Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
Gabriel Noaje has more than 10 years of experience in accelerator technologies and parallel computing. Gabriel has a deep understanding of users’ requirements in terms of manycore architecture after he worked in both enterprise and public sector roles. Prior to joining NVIDIA, he was a Senior Solutions Architect with SGI and HPE where he was developing solutions for HPC and Deep Learning customers in APAC.
Mr. YIN Jianxiong (Terry), Deep Learning Solution Architect, NVIDIA AI Technology Center
YIN Jianxiong currently is a Deep Learning Solutions Architect with NVIDIA AI Technology Center. He obtained his bachelor and master’s degree from South China University of Technology (SCUT), China and Yonsei University, South Korea, in 2009 and 2012 respectively. When he was with Nanyang Technological University (NTU), he was awarded ACM SIGCOMM Travel Grant in 2013. He was a lead member of the Cloud3DView projects when he was in NTU, and this project won ASEAN ICT Awards Gold Award and Datacenter Dynamics Award.
Attendee Set Up Requirements
To maximize your training time during your DLI training, please follow the instructions below, before attending your first training session:
1. You must bring your own laptop in order to run the training. Please bring your laptop and its charger.
2. A current browser is needed. For optimal performance, Chrome, Firefox, or Safari for Macs are recommended. IE is operational but does not provide the best performance.
3. Create an account at http://courses.nvidia.com/join. Click the “Create account” link to create a new account. If you are told your account already exists, please try logging in instead. If you are asked to link your “NVIDIA Account” with your “Developer Account”, just follow the on-screen directions.
4. Ensure your laptop will run smoothly by going to http://websocketstest.com/. Make sure that WebSockets work by ensuring Websockets is supported under “Environment”. Additionally, make sure that “Data Receive”, “Data Send” and “Echo Test” all check Yes under “WebSockets”. If there are issues with WebSockets, try updating your browser.
If you have any questions, please contact [email protected].
Learn how to use multiple GPUs to train neural networks and effectively parallelize training of deep neural networks using TensorFlow.
The computational requirements of deep neural networks used to enable AI applications like self-driving cars are enormous. A single training cycle can take weeks on a single GPU, or even years for the larger datasets like those used in self-driving car research. Using multiple GPUs for deep learning can significantly shorten the time required to train lots of data, making solving complex problems with deep learning feasible.
This course will teach you how to use multiple GPUs to training neural networks. You’ll learn:
• Approaches to multi-GPU training
• Algorithmic and engineering challenges to large-scale training
• Key techniques used to overcome the challenges mentioned above
Upon completion, you’ll be able to effectively parallelize training of deep neural networks using TensorFlow.
*Agenda is subjected to change
Content Level (e.g. Beginner): Beginner; Introduction to Deep Learning
Lab 1: Introduction to Multi-GPU Training
Define a simple neural network and a cost function and iteratively calculate the gradient of the cost function and model parameters using the SGD optimization algorithm.
Lab 2: Algorithmic Challenges to Multi GPU Training
Learn to transform single GPU to Horovod multi-GPU implementation to reduce the complexity of writing efficient distributed software. Understand the data loading, augmentation, and training logic using AlexNet model.
Lab 3: Engineering Challenges to Multi GPU Training
Understand the aspects of data input pipeline, communication, reference architecture and take a deeper dive into the concepts of job scheduling.
Experience with Stochastic Gradient Descent, Network Architecture, and Parallel Computing.
Mr. Nicolas Walker, Senior Solutions Architect, NVIDIA
Nicolas Walker is a Senior Solution Architect at NVIDIA. He looks after South East Asia customers around data centre and workstation solutions in the areas of High-Performance Computing, Deep Learning, Virtualised Desktops and Professional Graphics. Before joining NVIDIA in February 2016, Nicolas held roles in IBM and Lenovo as solution architect focusing on enterprise infrastructure and HPC for the last 15 years.
IMPORTANT: Please follow these pre-workshop instructions.
Attendee Set Up Requirements
To maximize your training time during your DLI training, please follow the instructions below, before attending your first training session:
1. You must bring your own laptop in order to run the training. Please bring your laptop and its charger.
2. A current browser is needed. For optimal performance, Chrome, Firefox, or Safari for Macs are recommended. IE is operational but does not provide the best performance.
3. Create an account at http://courses.nvidia.com/join. Click the “Create account” link to create a new account. If you are told your account already exists, please try logging in instead. If you are asked to link your “NVIDIA Account” with your “Developer Account”, just follow the on-screen directions.
4. Ensure your laptop will run smoothly by going to http://websocketstest.com/. Make sure that WebSockets work by ensuring Websockets is supported under “Environment”. Additionally, make sure that “Data Receive”, “Data Send” and “Echo Test” all check Yes under “WebSockets”. If there are issues with WebSockets, try updating your browser.
If you have any questions, please contact [email protected].
Learn how to apply convolutional neural networks (CNNs) to detect chromosome co-deletion and search for motifs in genomic sequences.
This course teaches you how to apply deep learning to detect chromosome co-deletion and search for motifs in genomic sequences. You’ll learn how to:
• Understand the basics of convolutional neural networks (CNNs) and how they work
• Apply CNNs to MRI scans of low-grade gliomas (LGGs) to determine 1p/19q chromosome co-deletion status
• Use the DragoNN toolkit to simulate genomic data and to search for motifs
Upon completion, you’ll be able to understand how CNNs work, evaluate MRI images using CNNs, and use real regulatory genomic data to research new motifs.
Lab #1: Image Classification with Digits (120mins)
Learn to interpret deep learning models to discover predictive genome sequence patterns using the DragoNN toolkit on simulated and real regulatory genomic data.
Lab #2: Deep Learning for Genomics using DragoNN with Keras and Theano (120 mins)
Learn to interpret deep learning models to discover predictive genome sequence patterns using the DragoNN toolkit on simulated and real regulatory genomic data.
Lab #3: Radiomics 1p19q Chromosome Image Classification with TensorFlow (120 mins)
Learn how to apply deep learning techniques to detect the 1p19q co-deletion biomarker from MRI imaging.
Mr. Gilad Shainer (Chairman, HPC-AI Advisory Council)
Gilad Shainer is an HPC evangelist that focuses on high-performance computing, high-speed interconnects, leading-edge technologies and performance characterizations. He serves as a board member in the OpenPOWER, CCIX, OpenCAPI and UCF organizations, a member of IBTA and contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is also a recipient of 2015 R&D100 award for his contribution to the CORE-Direct collective offload technology. Mr. Shainer holds an M.Sc. degree and a B.Sc. degree in Electrical Engineering from the Technion Institute of Technology. He also holds patents in the field of high-speed networking
Presentation: Pave the Way to Exascale
TBC
Mr. Jeffrey Adie (Principal Solutions Architect, APJI Region, NVIDIA)
Jeff is a HPC specialist with over 25 years of experience in developing, tuning and porting scientific codes and architecting HPC solutions. Jeff’s primary area of expertise is in CFD and NWP, having previously worked at the New Zealand Oceanographic Institute (now NIWA), Toyota Motor Corporation, and on FEA/CFD analysis for America’s cup class yachts for Team New Zealand. Prior to joining Nvidia, Jeff worked for SGI for 16 years in Asia, Before that, he worked for various Post Production companies in his native New Zealand as a Visual Effects artist, technical director, and software development roles. Jeff holds a post-graduate diploma from the University of Auckland in Computer Science, specialising in Parallel programming and Computer graphics.
Presentation: Engineering an HPC cluster solution for GPU-accelerated workloads
GPU-accelerated computing has become an integral part of HPC over the last few years and, when coupled with a high performance Infiniband interconnect, it is important to properly architect a solution to maximise productivity. This talk will cover the requirements for GPU accelerators and present best practices for designing and deploying GPU-based HPC solutions to deliver the optimal results.
Mr. Oren Laadan (CTO, Excelero)
Oren Laadan serves as APAC technical lead/CTO(*) at Excelero. Oren has extensive experience in research, innovation, and technological leadership, as part of his 20+ years professional tenure in the fields of computer systems, broadly defined. Prior to Excelero, he co-founded Cellrox, serving as Chief Technology Officer, to provide mobile virtualization solutions for security, privacy, and isolation use-cases in the Android ecosystem. Before co-founding Cellrox, he was a researcher at Columbia University with a focus on computer systems, virtualization, operating systems, cloud systems, security, and mobile computing. A graduate of the Israel Defense Forces elite “Talpiot” program, Oren pioneered R&D nascent technologies in cloud computing. Dr. Laadan holds a Ph.D. in Computer Science from Columbia University as well as an M.Sc. in Computer Science and a B.Sc. in Physics and Mathematics from Hebrew University
Presentation: Excelero NVMesh is a storage game changer for SuperComputing
Excelero’s NVMesh enables supercomputing centers to build high-performance, low-latency storage leveraging distributed NVMe for a variety of HPC use cases, including burst buffer, fast scratch and Nastran analytics. NVMesh enables shared NVMe across any network and supports any parallel file system. Distributed workloads can leverage the full performance of NVMe SSDs with the convenience of centralized storage while avoiding proprietary hardware lock-in and reducing the overall storage TCO. NVMesh enables SciNet to build a petabyte-scale unified pool of distributed high-performance NVMe as a burst buffer for checkpointing. The SciNet NVMe pool delivers 230GB/s of throughput and well over 20M random 4k IOPS and enables SciNet to meet its availability SLA’s
Mr. Avi Telyas (Director, System Engineering, Mellanox)
Avi Telyas is a Director of System Engineering in Mellanox Technologies, leading APAC Sales Engineering and FAE teams. Based in Tokyo, Avi is deeply involved in large HPC, Machine learning and AI deployments in Japan and APAC. In his free time, Avi is coding over AI frameworks and gets too excited talking about it. Avi holds a BSc (Summa cum laude) in Computer Science from the Technion Institute of Technology, Israel
Presentation: In-Network Computing in HPC System
The latest revolution in HPC is the move to a co-design architecture, a collaborative effort among industry, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-design recognizes that the CPU has reached the limits of its scalability, and offers In-Network-Computing to share the responsibility for handling and accelerating application workloads, offload CPU. By placing data-related algorithms on an intelligent network, we can dramatically improve the data center and applications performance
Dr. Richard Graham (HPC Scale Special Interest Group Chair, HPC-AI Advisory Council)
Dr. Richard Graham is Senior Director, HPC Technology at Mellanox Technologies, Inc. His primary focus is on HPC network software and hardware capabilities for current and future HPC technologies. Prior to moving to Mellanox, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration, and was chairman of the MPI 3.0 standardization efforts.
MPI Acceleration in HPC System
TBC
Dr. Yang Jian (Fellow, AMD)
Dr. Yang Jian has graduated from CAG&CG State Key Lab with PhD in 2002. He previous industry experiences included several IC companies on 3D graphics acceleration, Trident Multimedia Co. Ltd, Centrality Communications Co. Ltd and S3 Graphics Co Ltd. In 2006 Dr Yang joined ATI/AMD. Dr Yang has built up a strong team on performance verification, analysis and optimization of modern GPUs. The team has completed more than 40 ASICs’ tape-out. Dr Yang is concentrating on computer architect of HPC and Artificial Intelligence and deep learning algorithm optimization and ROCm open-source platform and HPC apps from AMD
Presentation: AMD Radeon Instinct™ Platforms For HPC and Machine Intelligence
AMD speeds up the HW/SW platforms for virtualization, HPC and machine intelligence with 7nm CPU ROME and 7nm GPU MI60&MI50. AMD RADOEN INSTICTTM MI60 has 7.4 FP64 computing capability, 64GB/s bandwidth PCIe Gen4 and 200GB/s infinite fabric Links. ROCm over OpenUCX provides short latency and high transmission bandwidth for MPI intranode and internode communications. Rapid evolution of ROCM open source software stack supports rapid HPC apps’ porting and many machine intelligence frameworks. Many Math libraries and various machine intelligence primitives are developed and optimized in ROCm on AMD RADOEN INSTINCT GPUs. AMD is working with many partners to promote ROCm for computing marketing.
Mr. Ashrut Ambastha (Sr. Staff of Solution Architect, Mellanox)
Ashrut Ambastha is the Sr. Staff Architect at Mellanox responsible for defining network fabric for large scale InfiniBand clusters and high-performance datacenter fabric. He is also a member of application engineering team that works on product designs with Mellanox silicon devices. Ashrut’s professional interests includes network topologies, routing algorithms and phy signal Integrity analysis/simulations. He holds a MSc and MTech in Electrical Engineering from Indian Institute of Technology-Bombay
Presentation: GPU Direct Accelerate HPC System
This talk is aimed towards professionals interested in discussing the role of up-coming Interconnect technologies and network topologies in the field of HPC and Artificial Intelligence. We will start with analysing the latest “in-network computing” architecture of Mellanox network ASICs and software layers. Discuss network topologies and associated resiliency mechanisms to meet the demands of high-performance, yet flexible computing and AI systems. To conclude, we will also dwell upon few offloading technologies built into the network components that can be applied to accelerate HPC and cloud native workloads as well as storage systems
Mr. Zivan Ori (CEO and Co-Founder, E8 Storage)
Mr. Zivan Ori is the co-founder and CEO of E8 Storage. Before founding E8 Storage, Mr. Ori held the position of IBM XIV R&D Manager, being responsible for developing the IBM XIV high-end, grid-scale storage system, and served as Chief Architect at Stratoscale, a provider of hyper-converged infrastructure. Prior to IBM XIV, Mr. Ori headed Software Development at Envara (acquired by Intel) and served as VP R&D at Onigma (acquired by McAfee)
Presentation: Accelerating Machine Learning with NVMe over Fabrics for GPU Clusters
GPU clusters are the basic building block for machine learning, but typical GPU servers have little room for internal storage. Relying on external storage like NAS or FC SAN does not deliver the anticipated performance needed for the GPUs, especially for training phase. NVMe over Fabrics to the rescue! By connecting shared NVMe enclosures over 100G Ethernet or InfiniBand, it is now possible to saturate the GPUs bandwidth and storage is no longer the bottleneck. E8 Storage will demo shared NVMe for GPU cluster and its impact on the performance of machine learning
Altair PBS Works™ is the market leader in comprehensive, secure workload management for high-performance computing (HPC) and cloud environments. It allows HPC users to simplify the management of their HPC infrastructure while optimizing system utilization, improving application performance, and maximizing ROI on hardware and software investments.
PBS User Group meetings are a regular feature in USA. ASIA with its rapidly growing user base more than qualifies to have its own event with many organisations now adopting this technology of choice.
Apart from introducing the latest features and solutions from the PBS stable, discussing the new acquisitions and partnerships, the user group allows a platform for knowledge and experience sharing amongst the peer group. The past user groups in other geographies have been as much a learning experience for us as has been for the PBS users, and having resulted in product enhancement based on user feedback. With a mass of expert users of PBS Works in ASIA now, the user group also offers an opportunity to you to list your requirements and features that can aid you in your business, and provides us an insight into what can we do to help you achieve it.
The event will end with recognising some of the key contributors and users of the PBS Suite of products; a small token of our appreciation for those contributing to the adoption of HPC in general, and PBS in particular in the region.
For enquiries, please contact Mr Manjunath Doddam at [email protected]
Since the 1950s, the increasing use of computers has transformed many domains of research—ranging from molecular dynamics to climate research or astronomy—through increasingly powerful computer simulations. Over the past decades, advances in instruments and sensors for data ingest, together with the spread of GPU computing, and enhanced by new approaches to data analysis and artificial intelligence, have provided scientists with a new set of tools that connect large empirical data and computation in a novel, and exciting way.
Scalable storage environments that provide sufficient performance for data ingest and analysis is an important ingredient for this new approach to research computing. Our focus in this DDN User Group is on new applications and approaches, supported by scalable high-performance storage, to harness the power of Big Data and AI in industry and research.
For enquiries, please contact [email protected]
*Note: This agenda is subject to change.
Spectrum Scale User Group is an event for Spectrum Scale users to gather & communicate their experience on this product. We will invite Spectrum Scale experts from around the world to join this event to share the latest news of Spectrum Scale with the audience.
If you are have experienced with HPC, don’t hesitate to join this event. If you are an IBM Spectrum Scale User, you are welcome to listen to our presenters as well as to share with the user group on your experience.
For enquiries, please contact Chris at [email protected].