Course

Computer Science - Parallel Computing

Indian Institute of Technology Delhi

This course serves as an introductory exploration of parallel programming, designed for those without prior experience in the field. Key prerequisites include knowledge of data structures and operating systems. The course structure is as follows:

  • Lecture Structure: L-T-P: 3-0-2
  • Hands-on Learning: Engage in significant parallel programming tasks on compute clusters, multi-core CPUs, and massive-core GPUs.

With the increasing number of cores in processors, mastering efficient programming techniques is vital for future technology. The course covers various topics, including:

  • Introduction to Parallel Computing
  • Parallel Programming Paradigms
  • Parallel Architecture and Case Studies
  • Open MP and its applications
  • PRAM Model of Computation
  • Memory Consistency and Performance Issues
  • Shared Memory vs. Message Passing
  • Message Passing Interface (MPI)
  • Algorithmic Techniques
  • CUDA Programming
  • Advanced Algorithms: Merging and Sorting
  • Lock-Free Synchronization and Graph Algorithms

Join us to develop essential skills in parallel programming that will equip you for the evolving landscape of computing.

Course Lectures
  • Mod-01 Lec-01 Introduction
    Dr. Subodh Kumar

    This module serves as an introduction to parallel computing, aiming to familiarize students with the fundamental concepts and terminology in the field.

    Students will learn about:

    • The importance of parallel computing in modern computing environments.
    • Key challenges faced in parallel programming.
    • Real-world applications and scenarios where parallel computing is essential.

    By the end of this session, students should have a clear understanding of what parallel computing entails and its relevance in today’s technology landscape.

  • In this module, students will explore various parallel programming paradigms that are fundamental in developing efficient parallel applications.

    The primary paradigms covered include:

    • Shared memory models
    • Distributed memory models
    • Data parallelism
    • Task parallelism

    Each paradigm will be discussed with relevant examples and use cases, giving students a robust understanding of when and how to apply each approach effectively.

  • This module delves into the architecture of parallel systems, providing students with essential knowledge of hardware components that facilitate parallel processing.

    Key topics include:

    • Understanding multi-core and many-core architectures
    • Overview of parallel memory architectures
    • Memory consistency models

    By analyzing different architectures, students will learn how hardware impacts the performance of parallel applications and what design considerations are critical in programming.

  • This module includes case studies that demonstrate the practical application of parallel architecture in real-life scenarios.

    Students will analyze:

    • Case studies from industry leaders in parallel computing
    • Lessons learned from successful and unsuccessful parallel applications
    • Design patterns that emerged from these case studies

    These insights will enable students to understand the challenges faced in real-world applications and enhance their problem-solving skills in parallel programming.

  • Mod-01 Lec-05 Open MP
    Dr. Subodh Kumar

    This module introduces OpenMP, a widely used API for parallel programming in shared memory environments, which simplifies the development of parallel applications.

    Students will learn:

    • Basic syntax and structure of OpenMP
    • How to parallelize code effectively using directives
    • Best practices for optimizing performance

    Hands-on exercises will allow students to implement OpenMP in simple programs, providing a practical understanding of its capabilities and limitations.

  • This continuation of OpenMP builds on the knowledge gained from the previous module and explores more advanced features and techniques.

    Students will cover:

    • Advanced synchronization techniques
    • Nested parallelism and its applications
    • Handling data dependencies in parallel programs

    Through practical examples, students will deepen their understanding of how to leverage OpenMP to effectively manage more complex parallel tasks.

  • This module focuses on OpenMP, a popular API for parallel programming in C, C++, and Fortran. Students will learn how to utilize OpenMP for shared-memory parallelism, enabling effective multi-threaded programming. Key topics include:

    • Introduction to OpenMP directives
    • Thread management and synchronization
    • Data sharing attributes
    • Performance optimization techniques

    Through practical exercises, learners will implement parallel constructs in real-world applications, enhancing their skills in handling multi-core processors efficiently.

  • This module introduces the PRAM (Parallel Random Access Machine) model, a theoretical framework for analyzing parallel algorithms. Students will explore:

    • The fundamentals of the PRAM model
    • Variations including EREW, CREW, and CRCW
    • Algorithm design using the PRAM model
    • Complexity analysis of parallel algorithms

    By understanding the PRAM model, students will gain insights into the performance of parallel algorithms and how they compare to sequential counterparts.

  • Mod-01 Lec-09 PRAM
    Dr. Subodh Kumar

    In this module, students will delve deeper into the PRAM model, examining its applications in parallel computing. The focus will be on:

    • Detailed case studies illustrating PRAM applications
    • Real-world implications of the PRAM model
    • Challenges and limitations of the PRAM model
    • Future directions for research in parallel computing

    This comprehensive overview will prepare students to understand the relevance of PRAM in modern parallel computing environments.

  • This module presents various models of parallel computation and their associated complexities. Topics covered include:

    • Different models of computation (PRAM, BSP, and more)
    • Complexity classes in parallel computing
    • Impact of parallel architecture on algorithm design
    • Comparative analysis of models

    Students will engage in discussions and exercises to understand how models influence the design and efficacy of parallel algorithms.

  • This module covers the critical topic of memory consistency in parallel computing. Key aspects include:

    • Understanding memory models and their impact on performance
    • Exploring consistency models (strong, weak, etc.)
    • Challenges in maintaining memory consistency
    • Real-world applications and implications for software design

    Students will learn about the trade-offs involved in choosing different memory consistency models and their effects on program behavior.

  • In this module, students will examine memory consistency and its performance issues in detail. Topics include:

    • Analysis of performance implications of various memory models
    • Techniques to optimize memory accesses
    • Trade-offs between consistency and performance
    • Case studies demonstrating performance impacts

    Students will engage in hands-on activities to identify and resolve performance bottlenecks related to memory consistency in parallel applications.

  • This module focuses on the fundamental principles of parallel program design, which is crucial for optimizing performance in modern computing environments. Students will explore:

    • The importance of parallelism in software development.
    • Common design patterns in parallel programming.
    • Strategies for decomposing problems into parallel tasks.
    • Performance metrics and benchmarks to evaluate parallel applications.

    By the end of this module, students will be equipped with the theoretical and practical knowledge necessary to design efficient parallel programs.

  • This module introduces students to two major paradigms in parallel programming: shared memory and message passing. Key topics include:

    • Understanding shared memory architectures and synchronization mechanisms.
    • Exploring message passing interfaces and their applications.
    • Comparing the advantages and disadvantages of both paradigms.
    • Hands-on exercises with programming examples to solidify understanding.

    Students will learn how to choose the appropriate paradigm based on specific application requirements and system architecture.

  • Mod-01 Lec-15 MPI
    Dr. Subodh Kumar

    This module focuses on the Message Passing Interface (MPI), a standard for parallel programming across distributed systems. Topics covered include:

    • Introduction to MPI and its core functionalities.
    • Understanding MPI communication patterns (point-to-point and collective).
    • Hands-on programming assignments to implement MPI in real-world scenarios.
    • Debugging and performance tuning in MPI applications.

    By the end of this module, students will have a practical understanding of how to leverage MPI for efficient parallel computing.

  • Mod-01 Lec-16 MPI(Contd.)
    Dr. Subodh Kumar

    This module continues the exploration of MPI with advanced topics and practices. Key areas of focus include:

    • Advanced MPI features like derived data types and non-blocking communications.
    • Scaling MPI applications for larger systems.
    • Case studies of MPI in high-performance computing environments.
    • Best practices for writing efficient MPI applications.

    Students will engage in projects that challenge their understanding and application of MPI concepts in more complex scenarios.

  • Mod-01 Lec-17 MPI(Contd..)
    Dr. Subodh Kumar

    This module continues with MPI, diving deeper into its functionalities and applications. Students will cover:

    • Further advanced MPI functionalities like one-sided communications.
    • Techniques for optimizing communication in distributed systems.
    • Exploration of parallel file I/O and its implications in data handling.
    • Challenges faced in large-scale MPI deployments and solutions.

    Through practical exercises, students will enhance their skills in managing and optimizing MPI applications.

  • This module delves into algorithmic techniques that are fundamental to parallel computing. Key topics include:

    • Parallel algorithms and their design principles.
    • Task scheduling and load balancing strategies.
    • Algorithms for sorting and searching in parallel environments.
    • Dynamic programming techniques adapted for parallel execution.

    Students will gain insights into how algorithmic efficiency can significantly impact performance in parallel systems, with hands-on examples to solidify learning.

  • This module continues the exploration of algorithmic techniques essential for parallel computing. Students will delve deeper into:

    • Advanced sorting algorithms
    • Graph algorithms suitable for parallel execution
    • Dynamic programming approaches adapted for concurrent processing

    Through practical assignments, students will implement these techniques in parallel environments, optimizing performance and efficiency. Real-world applications will be discussed to illustrate the relevance of these concepts in modern computing.

  • This module builds on the previous one, providing further insights into algorithmic techniques. Key topics include:

    • Refinement of previous algorithms
    • Introduction to competitive programming concepts
    • Evaluation and analysis of algorithmic performance

    Students will engage in coding exercises and collaborative projects to solidify their understanding, applying learned concepts to solve complex problems.

  • Mod-01 Lec-21 CUDA
    Dr. Subodh Kumar

    The focus of this module is on CUDA, a parallel computing platform and application programming interface (API) model created by NVIDIA. Key learning areas include:

    • Introduction to CUDA architecture
    • Writing simple CUDA programs
    • Understanding memory hierarchies in CUDA

    Students will complete exercises that allow them to apply CUDA in real-world scenarios, enhancing their programming skills on GPU architectures.

  • Mod-01 Lec-22 CUDA(Contd.)
    Dr. Subodh Kumar

    This module continues the exploration of CUDA programming, diving deeper into:

    • Optimizing CUDA code for performance
    • Advanced memory management techniques
    • Utilizing shared memory effectively

    Through hands-on projects, students will learn to optimize their applications for high-performance computing, addressing common pitfalls in CUDA programming.

  • Mod-01 Lec-23 CUDA(Contd..)
    Dr. Subodh Kumar

    This module presents further advanced topics in CUDA, including:

    • Multi-GPU programming techniques
    • Stream processing and concurrency
    • Debugging and profiling CUDA applications

    Students will engage in projects that require implementing advanced features, preparing them for real-world challenges in parallel computing environments.

  • Mod-01 Lec-24 CUDA(Contd...)
    Dr. Subodh Kumar

    In this final module, students will synthesize their knowledge of CUDA and parallel programming through comprehensive projects. Key activities will include:

    • Developing a complete application using CUDA
    • Collaborating in teams to tackle complex problems
    • Presenting solutions and receiving feedback

    This practical experience will solidify their learning and prepare them for future endeavors in high-performance computing and parallel programming.

  • The CUDA (Compute Unified Device Architecture) programming model is essential for harnessing the power of GPUs for parallel computing. In this module, students will continue their exploration of CUDA, focusing on advanced features and optimizations.

    Key topics will include:

    • Memory management techniques for efficient data transfer.
    • Kernel launches and execution configurations.
    • Performance tuning and profiling tools.

    By the end of this module, students will be equipped with the skills to write more efficient CUDA code and understand the underlying architecture better.

  • This module continues the in-depth study of CUDA programming, emphasizing the implementation of various algorithms on GPUs. Students will analyze different parallel algorithms and their efficiency when executed on CUDA.

    Topics covered will include:

    • Parallel reduction algorithms and their applications.
    • Matrix operations and their parallel implementations.
    • Understanding warp execution and its implications for performance.

    Students will engage in hands-on coding exercises to apply these concepts in real-world scenarios, enhancing their parallel programming proficiency.

  • In this module, students will delve deeper into the CUDA framework, focusing on additional advanced functionalities and optimizing existing code. The module will cover the nuances of performance gains through careful coding practices.

    Key learning objectives include:

    • Dynamic parallelism in CUDA.
    • Stream management and asynchronous operations.
    • Shared memory utilization for faster data access.

    Students will participate in practical sessions to implement these features and analyze their impact on performance.

  • This module introduces students to parallel algorithms with a focus on merging and sorting techniques. Parallelizing these fundamental algorithms is crucial for efficient data handling in larger datasets.

    Topics include:

    • Introduction to parallel sorting algorithms (e.g., Bitonic Sort, Parallel Merge Sort).
    • Understanding the principles of divide-and-conquer in parallel computing.
    • Analyzing the performance and complexity of parallel algorithms compared to their sequential counterparts.

    Hands-on programming assignments will enable students to implement and test these algorithms on multi-core systems.

  • Continuing from the previous module, this section further explores merging and sorting algorithms, emphasizing their implementation efficiencies and practical applications in real-world scenarios.

    Students will learn about:

    • Advanced merging techniques in parallel computing.
    • Performance benchmarking of various sorting algorithms.
    • Real-world applications of parallel sorting in large datasets.

    Through a series of coding labs, students will refine their implementations, ensuring optimal performance across varying data sizes and structures.

  • In the final module, students will consolidate their understanding of parallel algorithms by examining additional sorting techniques and their applicability in various computing environments. This will include a thorough analysis of the theoretical and practical aspects of parallel sorting methods.

    Key areas of focus will include:

    • Comparison of traditional vs parallel sorting algorithms.
    • Case studies of parallel sorting applications in industry.
    • Future trends in parallel computing and sorting methodologies.

    Students will also be tasked with a capstone project that showcases their ability to implement learned concepts in practical scenarios.

  • This module delves deeper into algorithms, specifically focusing on merging and sorting techniques. Students will explore various algorithms that enhance data processing efficiency. Key concepts include:

    • Understanding the importance of sorting algorithms in data organization.
    • Analyzing different merging techniques used in parallel computing.
    • Implementing these algorithms in practical scenarios to boost performance.

    By the end of this module, students will gain hands-on experience and insight into how these fundamental algorithms are applied in parallel programming contexts.

  • This module continues the exploration of merging and sorting algorithms, further expanding on their applications and optimizations. Students will:

    • Investigate advanced sorting algorithms and their computational complexities.
    • Learn how to optimize merging techniques for better performance in multi-threaded environments.
    • Apply theoretical concepts through practical coding exercises to solidify understanding.

    A strong emphasis will be placed on performance analysis and understanding the trade-offs involved in different approaches.

  • This module introduces students to lower bounds in lock-free synchronization and load stealing techniques. Participants will learn:

    • The significance of lock-free algorithms in concurrent programming.
    • How to implement and analyze load stealing for efficient task distribution.
    • Real-world applications and scenarios where lock-free synchronization is crucial.

    By the end of this module, students will have a solid foundation in the principles of synchronization that can enhance performance in multi-threaded applications.

  • This module covers lock-free synchronization in detail and explores various graph algorithms suitable for parallel environments. Key points include:

    • Understanding the principles and advantages of lock-free synchronization.
    • Learning various graph algorithms and their parallel implementations.
    • Discussing case studies that demonstrate the effectiveness of these algorithms in real-world applications.

    Students will engage in practical exercises to implement graph algorithms, reinforcing their understanding of both synchronization techniques and algorithmic efficiency.