Asked By – Kyle Brandt
How do you find the top correlations in a correlation matrix with Pandas? There are many answers on how to do this with R (Show correlations as an ordered list, not as a large matrix or Efficient way to get highly correlated pairs from large data set in Python or R), but I am wondering how to do it with pandas? In my case the matrix is 4460×4460, so can’t do it visually.
Now we will see solution for issue: List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?
You can use
DataFrame.values to get an numpy array of the data and then use NumPy functions such as
argsort() to get the most correlated pairs.
But if you want to do this in pandas, you can
unstack and sort the DataFrame:
import pandas as pd import numpy as np shape = (50, 4460) data = np.random.normal(size=shape) data[:, 1000] += data[:, 2000] df = pd.DataFrame(data) c = df.corr().abs() s = c.unstack() so = s.sort_values(kind="quicksort") print so[-4470:-4460]
Here is the output:
2192 1522 0.636198 1522 2192 0.636198 3677 2027 0.641817 2027 3677 0.641817 242 130 0.646760 130 242 0.646760 1171 2733 0.670048 2733 1171 0.670048 1000 2000 0.742340 2000 1000 0.742340 dtype: float64
This question is answered By – HYRY
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0