pyspark.RDD.zipWithUniqueId#
- RDD.zipWithUniqueId()[source]#
- Zips this RDD with generated unique Long ids. - Items in the kth partition will get ids k, n+k, 2*n+k, …, where n is the number of partitions. So there may exist gaps, but this method won’t trigger a spark job, which is different from - zipWithIndex().- New in version 1.2.0. - See also - Examples - >>> sc.parallelize(["a", "b", "c", "d", "e"], 3).zipWithUniqueId().collect() [('a', 0), ('b', 1), ('c', 4), ('d', 2), ('e', 5)]