Fix Python – Convert pyspark string to date format


Asked By – Jenks

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column.

I tried:'new_date')).show()

And I get a string of nulls. Can anyone help?

Now we will see solution for issue: Convert pyspark string to date format


Update (1/10/2018):

For Spark 2.2+ the best way to do this is probably using the to_date or to_timestamp functions, which both support the format argument. From the docs:

>>> from pyspark.sql.functions import to_timestamp
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>>, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect()
[Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

Original Answer (for Spark < 2.2)

It is possible (preferrable?) to do this without a udf:

from pyspark.sql.functions import unix_timestamp, from_unixtime

df = spark.createDataFrame(
    [("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], 

df2 =
    from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date')

#DataFrame[date_str: string, date: timestamp]
#|date_str  |date               |
#|11/25/1991|1991-11-25 00:00:00|
#|11/24/1991|1991-11-24 00:00:00|
#|11/30/1991|1991-11-30 00:00:00|

This question is answered By – santon

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0