Quantcast
Channel: User Anton Protopopov - Stack Overflow
Browsing all 38 articles
Browse latest View live

Comment by Anton Protopopov on Take n rows from a spark dataframe and pass to...

@jamiet head return first n rows like take, and limit limits resulted Spark Dataframe to a specified number. Probably in that case limit is more appropriate.

View Article



Comment by Anton Protopopov on How do I convert a list in a Pandas DF into a...

@Shoof @IanS I edited answer to add timings. And add new method with str.join which is on 2nd place after .apply(', '.join)

View Article

Comment by Anton Protopopov on How to drop rows of Pandas DataFrame whose...

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.

View Article

Comment by Anton Protopopov on pandas: convert index type in multiindex...

@Rockbar id is the same as df1.index. Edited for that

View Article

Comment by Anton Protopopov on convert nan value to zero

@TehTris you're right, thanks. I changed it to b = np.where(np.isnan(a), 0, a) which is more straightforward then with ~ as I think.

View Article


Comment by Anton Protopopov on obtaining last value of dataframe column...

@cikatomo are you sure that you are using iloc but not loc?

View Article

Comment by Anton Protopopov on Fillna in multiple columns in place in Python...

@Lenwood yes, you could follow the link for dtype.kind to check that.

View Article

Comment by Anton Protopopov on Make new column in Panda dataframe by adding...

you could drop list(df.columns) as it's redundant here. So final code should look like df['sum'] = df.sum(axis=1)

View Article


Comment by Anton Protopopov on Replacing part of string in python pandas...

@ArthurD.Howland code from the answer should work for that cases.

View Article


Comment by Anton Protopopov on How to replace NaN values in a dataframe column

@ShyamBhimani it should replace only NaN i.e. values where np.isnan is True

View Article

Comment by Anton Protopopov on How can I make pandas dataframe column headers...

I guess it's easier to write df.columns.astype(str).str.lower() in that case but maybe a bit verbose.

View Article

Comment by Anton Protopopov on Extract column value based on another column...

@ssuhas76 sure, in case df.loc[df['B'] == 3, 'A'].iloc[0] is a hashable value (float, int, string, set) it can be used for fillna`.

View Article

Image may be NSFW.
Clik here to view.

Answer by Anton Protopopov for Plotting percentage in seaborn bar plot

You could use your own function in sns.barplotestimator, as from docs:estimator : callable that maps vector -> scalar, optional Statistical function to estimate within each categorical bin.For you...

View Article


Image may be NSFW.
Clik here to view.

Seaborn heatmap to plotly failed

I'm having plotly error when converting seaborn.heatmap figure to plotly. I'm doing that in jupyter notebook with following code:%matplotlib inlineimport numpy as npimport seaborn as snsimport...

View Article

How to get all parameters of estimator in PySpark

I have a RandomForestRegressor, GBTRegressor and I'd like to get all parameters of them. The only way I found it could be done with several get methods like:from pyspark.ml.regression import...

View Article


Answer by Anton Protopopov for Syntax for PALIVE in Python like pnbd.PAlive...

You could use conditional_probability_alive method from lifetimes package. You need to pass frequency, recency, and T for each customer. For example for BetaGeoFitter (BG/NBD model):from lifetimes...

View Article

Answer by Anton Protopopov for Getting indices of a specific value in numpy...

You could use numpy.argwhere as @chappers pointed out in the comment:arr = np.array([0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1])In [34]: np.argwhere(arr == 0).flatten()Out[34]:array([ 0, 1,...

View Article


Answer by Anton Protopopov for pySpark Kafka Direct Streaming update...

I write some functions to save and read Kafka offsets with python kazoo library.First function to get singleton of Kazoo Client:ZOOKEEPER_SERVERS = "127.0.0.1:2181"def get_zookeeper_instance(): from...

View Article

Answer by Anton Protopopov for How to truncate the time on a datetime object?

You could use pandas for that (although it could be overhead for that task). You could use round, floor and ceil like for usual numbers and any pandas frequency from offset-aliases:import pandas as...

View Article

Answer by Anton Protopopov for Remove non-numeric rows in one column with pandas

You could use standard method of strings isnumeric and apply it to each value in your id column:import pandas as pdfrom io import StringIOdata = """id,name1,A2,B3,Ctt,D4,E5,Fde,G"""df =...

View Article
Browsing all 38 articles
Browse latest View live




Latest Images