Matplotlib bar chart from two variable column - Pandas data manipulation (mostly)

Multi tool use


Matplotlib bar chart from two variable column - Pandas data manipulation (mostly)
NOTE: Solution Posted Below!!!
I have a time-indexed table with a column BLIP that has only two values "XX" and "YY". The goal is to show a count of "XX" and "YY" with "YY" being below the x axis. I'm trying to create the correct data structure from a pandas table using code from Wes McKenney's book on Data Analysis (pg 26 I think):
df = base_df.drop(columns=dropcols).set_index('Created')
group = ['f2','BLIP']
df0 = df_minus.groupby(group)
agg_counts = df0.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer).copy()
table = count_subset.groupby('BLIP').resample('MS').count().unstack('BLIP')['BLIP']
chart = table.plot.bar(title = chart_title, x=None, color = ['green', 'red', 'grey']);
The line
agg_counts = df0.size().unstack().fillna(0)
results in the following error:
TypeError: 'numpy.int32' object is not callable
I found this gem of a snippet here, but can't find the documentation to decypher it.
data['values'].plot(kind='bar', color=data.positive.map({True: 'g', False: 'r'}))
This seems like is would be very simple, but I'm quite wrapped about the axle on this.
The pandas table format is something like
create_date f1 f2 f3 BLIP f5...
dt_stamp X Y Z XX K1
dt_stamp S R Y YY K3
dt_stamp P P T XX K1
and so on.
Per Jesse's suggestion I tried
df_plus =df[df['BLIP']=='XX']
df_minus=df[df['BLIP']=='YY']
ax = plt.axes()
ax.bar(df_plus.index, df_plus['BLIP'], width=0.4, color='g')
ax.bar(df_neg.index, df_minus['BLIP'], width=0.4, color='r')
ax.autoscale()
plt.show()
This resulted in
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Solution in its entirety
df = base_df
plt.clf()
fig = plt.figure()
width = 8
height = 6
fig.set_size_inches(width, height)
chart_title = 'YTD CR Trend Summary'
df_plus =df[df['BLIP'] == 'XX']
df_minus=df[df['BLIP']== 'IYY']
p = df_plus.resample('MS').count()['BLIP'].fillna(0)
n = df_minus.resample('MS').count()['BLIP'].apply(lambda x: int(-x))
print(chart_title, len(df), p.sum(), n.sum())
plt.clf()
fig = plt.figure()
# ax = fig.add_subplot(1, 1, 1)
ax = plt.axes(label=chart_title) #label suppresses warning
if p.sum() != False:
ax.bar(p.index, p, width=10, color='g')
if n.sum() != False:
ax.bar(n.index, n, width=10, color='r')
plt.suptitle(chart_title, fontsize=11)
filename = f'{graph_images_dir}{chart_title}.png'
print(f'Saving {filename}')
plt.savefig(filename, bbox_inches='tight', pad_inches=0.5, dpi=200)
plt.show()
Is seems as if I have to go from this 3NF dataframe to some sort of object that gives me an object with BLIP.value_counts by month and then I somehow need to map them to the plot properly,
– Harvey
yesterday
1 Answer
1
You can plot it manually using matplotlib:
import matplotlib.pyplot as plt
ax = plt.axes()
ax.bar(table.index, table['XX'], width=0.4, color='g')
ax.bar(table.index, table['YY'], width=0.4, color='r')
I divided the table into two dataframes df_pos and df_neg, then tried ax.bar(table.index, df_pos['BLIP'], width=0.4, color='g') and got this error ValueError: shape mismatch: objects cannot be broadcast to a single shape
– Harvey
yesterday
That means your
table.index
does not have the same length as df_pos['BLIP']
. You can use df_pos.index
instead– Jesse Bakker
yesterday
table.index
df_pos['BLIP']
df_pos.index
Solution posted above. Jesse suggestion is correct, but table['XX'] should be table['BLIP'] Warning: Do not code when sleep deprived!!
– Harvey
19 hours ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I would like to know if I can use the data.positive.map to directly map to the "XX" values in BLIP or if I have to create a new field with True and False values.
– Harvey
yesterday