Fix Python – How to “select distinct” across multiple data frame columns in pandas?

I’m looking for a way to do the equivalent to the SQL
SELECT DISTINCT col1, col2 FROM dataframe_table

The pandas sql comparison doesn’t have anything about distinct.
.unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do i….

Fix Python – Drop all duplicate rows across multiple columns in Python Pandas

The pandas drop_duplicates function is great for “uniquifying” a dataframe. However, one of the keyword arguments to pass is take_last=True or take_last=False, while I would like to drop all rows which are duplicates across a subset of columns. Is this possible?
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A

As an example, ….

Fix Python – How do I get a list of all the duplicate items using pandas in python?

I have a list of items that likely has some export issues. I would like to get a list of the duplicate items so I can manually compare them. When I try to use pandas duplicated method, it only returns the first duplicate. Is there a a way to get all of the duplicates and not just the first one?
A small subsection of my dataset looks like this:
….

Fix Python – Remove duplicates by columns A, keeping the row with the highest value in column B

I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B.
So this:
A B
1 10
1 20
2 30
2 40
3 10

Should turn into this:
A B
1 20
2 40
3 10

I’m guessing there’s probably an easy way to do this—maybe as easy as sorting the DataFrame before dropping duplicates—but I don’t know ….

Fix Python – Remove pandas rows with duplicate indices

How to remove rows with duplicate index values?
In the weather DataFrame below, sometimes a scientist goes back and corrects observations — not by editing the erroneous rows, but by appending a duplicate row to the end of a file.
I’m reading some automated weather data from the web (observations occur every 5 minutes, and compiled into monthly fi….

[Fixed] *args calling plt.plot() for each optional argument

I’m having some trouble with passing an arbitrary number of arguments to plt.plot(). For each argument I try to pass via *args my function calls plt.plot() two times creating duplicates for each optional argument. The read_n_plot function is meant to read some datafile, spit out the data as lists, and create a plot of said data. If I want to create a plot of an XRD standard for a crystal I want to make it easily distinguishable from whatever experimental data I’m comparing it with.
From what I understand *args is a tuple containing all the arguments passed to my function. But I can’t seem to figure out how to pass everything inside *args as is, and not calling plt.plot(arg1) -> plt.plot(arg2) -> etc
Got any hints I could try?
Here is my code:
import matplotlib.pyplot as plt

#def read_n_plot(datafile, color, thickness, style = ‘-‘, *args):
def read_n_plot(datafile, *args, **kwargs):
vinkel = []
intensitet = []
with open(datafile, encoding=’utf8′, errors=’ignore’) as f:
if datafile.endswith(‘.int’):
next(f); next(f)
lines = f.readlines()
for line in lines:
if line and line[0].isalpha():
continue
data = line.split()
theta, counts = float(data[0]), float(data[1])
vinkel.append(theta)
intensitet.append(counts)
intensitet_norm = [i/max(intensitet) for i in intensitet]
plt.plot(vinkel, intensitet_norm, label = datafile, *args, **kwargs)
return vinkel, intensitet_norm

plt.figure(figsize=(16,9))
read_n_plot(‘NaCl_data.xy’, ‘k’, ‘–‘, lw = 1.0)
plt.legend(loc=’best’)
plt.xlabel(r’$2\theta$’)
plt.xlim(0, 65)
plt.ylabel(‘Intensitet (a.u.)’)
plt.tick_params(left=None)
plt.yticks([])
plt.show()

Example below uses standard powder XRD data for NaCl. It seems the ‘–‘ argument for linestyle didn’t get through at all. The legend says there are two plots.
NaCl XRD data duplicate: