Message-ID:

Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer

Hello,

More of my philosophy about quantum computing and about matrix operations and about scalability and more of my thoughts..

I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, i have just looked at the following
video about the powerful parallel quantum computer of IBM from USA that will be soon available in the cloud, and i invite you to look at it:

Quantum Computing: Now Widely Available!

https://www.youtube.com/watch?v=laqpfQ8-jFI

But i have just read the following paper and it saying that the powerful Quantum algorithms for matrix operations and linear systems of equations are available, read about them on the above paper, so as you notice in the following paper that many matrix operations and also the linear systems of equations solver can be done in a quantum computer, read about it here in the following paper:

Quantum algorithms for matrix operations and linear systems of equations

Read more here:

https://arxiv.org/pdf/2202.04888.pdf

So i think that IBM will do the same for there powerful parallel quantum computer that will be available in the cloud, but i think that you will have to pay for it of course since i think it will be commercial, but i think that there is a weakness with this kind of configuration of the powerful quantum computer from IBM, since the cost of bandwidth of internet is exponentially decreasing , but the latency of accessing the internet is not, so it is why i think that people will still use classical computers for many mathematical applications that uses mathematical operations such as matrix operations and linear systems of equations etc. that needs a much faster latency, so i think that the business of classical computers will still be great in the future even with the coming of the powerful parallel quantum computer of IBM, so as you notice this kind of business is not only dependent on Moore's law and Bezos’ Law , but it is also dependent on the latency of accessing internet, so read my following thoughts about Moore’s law and about Bezos’ Law:

More of my philosophy about Moore’s law and about Bezos’ Law..

For RAM chips and flash memory, Moore's Law means that in eighteen months you'll pay the same price as today for twice as much storage.
But other computing components are also seeing their price versus performance curves skyrocket exponentially. Data storage doubles every twelve months..

More about Moore’s law and about Bezos’ Law..

"Parallel code is the recipe for unlocking Moore’s Law"

And:

"BEZOS’ LAW

The Cost of Cloud Computing will be cut in half every 18 months - Bezos’ Law

Like Moore’s law, Bezos’ Law is about exponential improvement over time. If you look at AWS history, they drop prices constantly. In 2013 alone they’ve already had 9 price drops. The difference; however, between Bezos’ and Moore’s law is this: Bezos’ law is the first law that isn’t anchored in technical innovation. Rather, Bezos’ law is anchored in confidence and market dynamics, and will only hold true so long as Amazon is not the aggregate dominant force in Cloud Computing (50%+ market share). Monopolies don’t cut prices.."

More of my philosophy about matrix-matrix multiplication and about scalability and more of my thoughts..

I think that the time complexity of the Strassen algorithm for matrix-matrix multiplication is around O(N^2.8074), and the time complexity of the naive algorithm is O(N^3) , so it is not a significant difference, so i think i will soon implement the parallel Blocked matrix-matrix multiplication and i will implement it with a new algorithm that also uses intel AVX512 and that uses fused multiply-add and of course it will use the assembler instructions below of prefetching into caches so that to gain a 22% speed, so i think that overall it will have around the same speed as parallel BLAS, and i say that Pipelining greatly increases throughput in modern CPUs such as x86 CPUs, and another common pipelining scenario is the FMA or fused multiply-add, which is a fundamental part of the instruction set for some processors.. The basic load-operate-store sequence simply lengthens by one step to become load-multiply-add-store. The FMA is possible only if the hardware supports it, as it does in the case of the Intel Xeon Phi, for example, as well as in Skylake etc.

More of my philosophy about matrix-vector multiplication of large matrices and about scalability and more of my thoughts..

The matrix-vector multiplication of large matrices is completly limited by the memory bandwidth as i have just said it, read it below, so vector extensions like using SSE or AVX are usually not necessary for matrix-vector multiplication of large matrices. It is interesting that
matrix-matrix-multiplications don't have these kind of problems with memory bandwidth. Companies like Intel or AMD typically usually show benchmarks of matrix-matrix multiplications and they show how nice they scale on many more cores, but they never show matrix-vector multiplications, and notice that my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also memory-bound and the matrices for it are usually big, but my new algorithm of it is efficiently cache-aware and efficiently NUMA-aware, and i have implemented it for the dense and sparse matrices.

More of my philosophy about the efficient Matrix-Vector multiplication algorithm in MPI and about scalability and more of my thoughts..

Matrix-vector multiplication is an absolutely fundamental operation, with countless applications in computer science and scientific computing. Efficient algorithms for matrix-vector multiplication are of paramount importance, and notice that for matrix-vector multiplication, n^2 time is certainly required for an n × n dense matrix, but you have to be smart, since in MPI computing for also the supercomputer exascale systems, doesn't only take into account this n^2 time, since it has to also be efficiently be cache-aware, and it has to also have a good complexity for the how much memory is used by the parallel processes in MPI, since notice carefully with me that you have also to not send both a row of the matrix and the vector the the parallel processes of MPI, but you have to know how to reduce efficiently this complexity by for example dividing each row of the matrix and by dividing the vector and sending a part of the row of the matrix and a part of the vector to the parallel processes of MPI, and i think that in an efficient algorithm for Matrix-Vector multiplication, time for addition is dominated by the communication time, and of course that my implementation of my Powerful Open source software of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also smart, since it is efficiently cache-aware and efficiently NUMA-aware, and it implements both the dense and the sparse, and of course as i am showing below, it is scaling well on the memory channels, so it is scaling well in my 16 cores dual Xeon with 8 memory channels as i am showing below, and it will scale well on 16 sockets HPE NONSTOP X SYSTEMS or the 16 sockets HPE Integrity Superdome X with above 512 cores and with 64 memory channels, so i invite you to read carefully and to download my Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well from my website here:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

MPI will continue to be a viable programming model on exascale supercomputer systems, so i will soon implement many algorithms in MPI for Delphi and Freepascal and i will provide you with them, i am currently
implementing an efficient Matrix-Vector multiplication algorithm in MPI
and you have to know that an efficient Matrix-Vector multiplication algorithm is really important for scientific applications, and of course i will also soon implement many other interesting algorithms in MPI for Delphi and Freepascal and i will provide you with them, so stay tuned !
More of my philosophy about the memory bottleneck and about scalability
and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, and I am also specialized in parallel computing, and i know that the large cache can reduce Amdahl’s Law bottleneck – main memory, but you have to understand what i am saying, since my Open source project below of my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also memory-bound and the matrices for it are usually big, and since also the sparse linear system solvers are ubiquitous in high performance computing (HPC) and often are the most computational intensive parts in scientific computing codes. A few of the many applications relying on sparse linear solvers include fusion energy simulation, space weather simulation, climate modeling, and environmental modeling, and finite element method, and large-scale reservoir simulations to enhance oil recovery by the oil and gas industry. So it is why i am speaking about the how many memory channels comes in the 16 sockets HPE NONSTOP X SYSTEMS or the 16 sockets HPE Integrity Superdome X, so as you notice that they can come with more than 512 cores and with 64 memory channels. Also i have just benchmarked my Scalable Varfiler and it is scaling above 7x on my 16 cores Dual Xeon processor, and it is scaling well since i have 8 memory channels, and i invite you to look at my powerful Scalable Varfiler carefully in the following web link:

Click here to read the complete article

Subject	Author
More of my philosophy about quantum computing and about matrix	Amine Moulay Ramdane

Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer

devel / comp.programming.threads / More of my philosophy about quantum computing and about matrix operations and about scalability and more of my thoughts..