pyspark.pandas.Series.describe#
- Series.describe(percentiles=None)[source]#
- Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding - NaNvalues.- Analyzes both numeric and object series, as well as - DataFramecolumn sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.- Parameters
- percentileslist of floatin range [0.0, 1.0], default [0.25, 0.5, 0.75]
- A list of percentiles to be computed. 
 
- percentileslist of 
- Returns
- DataFrame
- Summary statistics of the Dataframe provided. 
 
 - See also - DataFrame.count
- Count number of non-NA/null observations. 
- DataFrame.max
- Maximum of the values in the object. 
- DataFrame.min
- Minimum of the values in the object. 
- DataFrame.mean
- Mean of the values. 
- DataFrame.std
- Standard deviation of the observations. 
 - Notes - For numeric data, the result’s index will include - count,- mean,- std,- min,- 25%,- 50%,- 75%,- max.- For object data (e.g. strings or timestamps), the result’s index will include - count,- unique,- top, and- freq. The- topis the most common value. The- freqis the most common value’s frequency. Timestamps also include the- firstand- lastitems.- Examples - Describing a numeric - Series.- >>> s = ps.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.0 50% 2.0 75% 3.0 max 3.0 dtype: float64 - Describing a - DataFrame. Only numeric fields are returned.- >>> df = ps.DataFrame({'numeric1': [1, 2, 3], ... 'numeric2': [4.0, 5.0, 6.0], ... 'object': ['a', 'b', 'c'] ... }, ... columns=['numeric1', 'numeric2', 'object']) >>> df.describe() numeric1 numeric2 count 3.0 3.0 mean 2.0 5.0 std 1.0 1.0 min 1.0 4.0 25% 1.0 4.0 50% 2.0 5.0 75% 3.0 6.0 max 3.0 6.0 - For multi-index columns: - >>> df.columns = [('num', 'a'), ('num', 'b'), ('obj', 'c')] >>> df.describe() num a b count 3.0 3.0 mean 2.0 5.0 std 1.0 1.0 min 1.0 4.0 25% 1.0 4.0 50% 2.0 5.0 75% 3.0 6.0 max 3.0 6.0 - >>> df[('num', 'b')].describe() count 3.0 mean 5.0 std 1.0 min 4.0 25% 4.0 50% 5.0 75% 6.0 max 6.0 Name: (num, b), dtype: float64 - Describing a - DataFrameand selecting custom percentiles.- >>> df = ps.DataFrame({'numeric1': [1, 2, 3], ... 'numeric2': [4.0, 5.0, 6.0] ... }, ... columns=['numeric1', 'numeric2']) >>> df.describe(percentiles = [0.85, 0.15]) numeric1 numeric2 count 3.0 3.0 mean 2.0 5.0 std 1.0 1.0 min 1.0 4.0 15% 1.0 4.0 50% 2.0 5.0 85% 3.0 6.0 max 3.0 6.0 - Describing a column from a - DataFrameby accessing it as an attribute.- >>> df.numeric1.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.0 50% 2.0 75% 3.0 max 3.0 Name: numeric1, dtype: float64 - Describing a column from a - DataFrameby accessing it as an attribute and selecting custom percentiles.- >>> df.numeric1.describe(percentiles = [0.85, 0.15]) count 3.0 mean 2.0 std 1.0 min 1.0 15% 1.0 50% 2.0 85% 3.0 max 3.0 Name: numeric1, dtype: float64