The long long journey...The long long journey...
Documenting some cross-platform issues I encountered while packaging client applications with Pyinstaller.
Windows 7 Compatibility Issues 1 LoadLibrary: PyInstaller FormatMessageW failed. Starting from Python 3.9, Windows 7 is no longer supported. If the software needs to run on Windows 7, the highest Python environment is 3.8.
Missing Dynamic Runtime Libraries in Win 7 1 This program cannot start because the computer is missing api-ms-win-core-path-l1-1-0.dll. Try reinstalling the program to fix this problem. Fix what’s missing - put the DLL file in the program’s root directory folder, or in C:/windows/System32. Generally, this problem does not occur in isolation but because Pyinstaller did not include some dynamically loaded libraries during packaging.
Review Outline Chapter One Amdahl’s Law Understanding of the law (with the task unchanged, the improvement of speed, speed-up ratio), the limits of acceleration
Application question 6’*5
Layout of grids and thread blocks, calculating global id Parallelism, concurrency, warp of threads, global id, multi-core CPU vs many-core GPU
Program analysis question 10*2
Writing results for given codes, analyze why those results occur
Multi-core CPU 10*2
Data partitioning: clarify the data range handled by each part Task parallelism: thread pool experiment
CUDA Programming 15*2
Specific problems, design grids and thread blocks, or given thread blocks, just need to design grids; Fixed procedure in the main function; the key lies in writing kernel functions;
Parallel Computing Concurrency and Parallelism Serial: single-machine single-core, instructions execute in order.
This article demonstrates how to write CUDA kernel functions and uses five examples, such as matrix transposition, to accelerate the solution of large-scale problems with GPU multi-core.
Matrix Transposition Algorithm Process Store the matrix to be transposed in GPU memory. Allocate space on the GPU to store the transposed matrix. Define a CUDA kernel function to implement the matrix transposition. This kernel function should use the concept of thread blocks and thread grids to handle all elements in the matrix. Within each thread block, threads can use shared memory to process data. Finally, use global memory to write the result back to the GPU. Call the CUDA kernel function to perform matrix transposition. Copy the transposed matrix from GPU memory to host memory. Release GPU memory Code Implementation Based on traditional code, use shared memory to optimize access to global memory and apply padding operation to avoid bank conflicts.
A Fascinating Interpretation of Bayes’ Theorem The Bayesian school believes that nothing is random. If it appears so, then it’s due to a lack of information (Shannon Information Theory); The Bayesian school in statistics has led to the Bayesian theory in machine learning.
Bayes’ theorem provides us with the ability to infer based on the probabilities before an event, after the event has occurred.
An accidental use of Bayes’ example: A joke—water is deadly poison because everyone who got cancer drank water. An example of being inadvertently misled by Bayes: Diagnostic methods with very high detection rates (99.9% accuracy) have an extremely high misdiagnosis rate (>50%). This is because the prevalence of the disease in the general population is less than 1%. Probability and statistics are indeed like a young girl who is ready to be dressed up by anyone.
$$P(c|x) = \frac{P(c)P(x|c)}{P(x)}$$
Understanding Bayes’ Theorem from a Machine Learning Perspective The same formula as above, but in machine learning, it defines a naive Bayes classifier, read as P c given x, where the left side is the posterior probability, $P(c)$ is the prior probability, $P(x|c)$ is the likelihood, which is the main focus of the model’s learning. $P(x)$ is the same for all input samples and is used for normalization (it is expanded using the total probability formula during computation); the estimation of $P(c)P(c|x)$ can use the method of Maximum Likelihood Estimation (see Watermelon Book p148). From a general perspective (which might not be entirely accurate): $P(c)$ is the original probability of an event. After something happens (or we know it has happened, which goes to the divide points between Bayesian and frequentist schools), $P(c|x)$ is the adjusted probability, and the adjustment factor is $\frac{P(x|c)}{P(x)}$.
This article will use two examples to practice multithreading programming with pthread, mainly covering two parts:
Parallel computation of the PI value by dividing data Thread pool development based on the producer-consumer pattern, with the specific business logic simplified to focus on thread management and synchronization Calculating Pi Conceptual Overview Based on the Leibniz formula, calculate a large number of times through multithreading to approximate $\pi$. Use multithreading for dividing data, i.e., each thread handles a part of the data for acceleration. Due to potential conflicts when multiple threads access a global result, mutexes and semaphores are used to organize threads to orderly add local results to the global outcome.
Vector The introduction of numbers as coordinates is an act of violence.
AND on the flip side, it gives people a language to describe space and the manipulation of space using numbers that can be crunched and run through a computer. Heresy: Linear algebra lets programmers manipulate space. $i$ and $j$ are basis vectors, any vector can be seen as their linear combination. Collinear vectors are linearly dependent, and the space they span is just a line (or the origin); Non-collinear vectors are linearly independent, and the space they span is the entire set of vectors;
Matrix Kind of linear transformation