pyspark.pandas.Series.apply#
- Series.apply(func, args=(), **kwds)[source]#
- Invoke function on values of Series. - Can be a Python function that only works on the Series. - Note - this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting. - To avoid this, specify return type in - func, for instance, as below:- >>> def square(x) -> np.int32: ... return x ** 2 - pandas-on-Spark uses return type hint and does not try to infer the type. - Parameters
- funcfunction
- Python function to apply. Note that type hint for return type is required. 
- argstuple
- Positional arguments passed to func after the series value. 
- **kwds
- Additional keyword arguments passed to func. 
 
- Returns
- Series
 
 - See also - Series.aggregate
- Only perform aggregating type operations. 
- Series.transform
- Only perform transforming type operations. 
- DataFrame.apply
- The equivalent function for DataFrame. 
 - Examples - Create a Series with typical summer temperatures for each city. - >>> s = ps.Series([20, 21, 12], ... index=['London', 'New York', 'Helsinki']) >>> s London 20 New York 21 Helsinki 12 dtype: int64 - Square the values by defining a function and passing it as an argument to - apply().- >>> def square(x) -> np.int64: ... return x ** 2 >>> s.apply(square) London 400 New York 441 Helsinki 144 dtype: int64 - Define a custom function that needs additional positional arguments and pass these additional arguments using the - argskeyword- >>> def subtract_custom_value(x, custom_value) -> np.int64: ... return x - custom_value - >>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64 - Define a custom function that takes keyword arguments and pass these arguments to - apply- >>> def add_custom_values(x, **kwargs) -> np.int64: ... for month in kwargs: ... x += kwargs[month] ... return x - >>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64 - Use a function from the Numpy library - >>> def numpy_log(col) -> np.float64: ... return np.log(col) >>> s.apply(numpy_log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64 - You can omit the type hint and let pandas-on-Spark infer its type. - >>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64