Working with Dates
Convert timestamps to the ISO 8601 date format, add a time attribute to a date attribute to get a single timestamp, and obtain the number of days between two dates using transformation functions.
You can use the following formats and functions to work with dates.
Datetime Formats
Tamr Core uses the following standards:
- The Java SimpleDateFormat for formatting
date
andtime
pattern strings. - The ISO 8601 standard.
You can specify how date and time values are formatted in your data. If your data does not use one of the Predefined Date Formats, you use the following syntax to provide the format that is used:
- Date and time must be separated by a colon
:
character, such asdd/MM/yyyy:HH:mm
. - Other separators that occur between values, such as - or spaces, can be entered as they appear in your data.
- Times that use the 12-hour (AM/PM) clock are converted to the 24-hour timekeeping system.
You use the following case-sensitive placeholders to represent the components of date and time values:
dd
is a zero-padded two-digit day, such as25
.d
is valid for single-digit days.MMM
is a three-letter month, such asJAN
.MM
is a zero-padded two-digit month, such as01
.M
is valid for single-digit months.yyyy
is the four-digit year, such as1999
.HH
is the two-digit hour of a 24-hour clock time, such as17
.hh
is the two-digit hour of a 12-hour clock time, such as08
(seea
below for AM/PM).mm
is the two-digit minute, such as01
.ss
is the two-digit second, such as30
.S
indicates additional units; add oneS
for each decimal place.Z
is the timezone in UTC. You can also use+HH:mm
or-HH:mm
for the UTC offset, such as−05:00
for New York on standard time.a
, including the leading space, is the AM/PM indicator for times that use the 12-hour clock.
Converting Timestamps to ISO 8601
The following functions convert timestamps (that is, a date, time, or both) in a variety of formats to the ISO 8601 date format. To convert timestamps in different formats to the ISO format, these functions apply any optional formats that you specify to your data first, and then use a predefined list of standard, frequently-used formats. As a result, these functions can convert timestamps in multiple formats to ISO 8601 format.
Important: Prior to v2020.018, the
datetime_to_iso
anddate_and_time_to_iso
functions required explicit definitions for the datetime formats to convert. Beginning with v2020.018, these functions automatically use the predefined list of datetime formats in addition to your explicitly defined formats. As a result, the output of these functions may change.
datetime_to_iso
datetime_to_iso
If your timestamp is a single datetime string stored in one attribute, this function converts it to ISO 8601 format. If you explicitly specify one or more optional formats, you do not have to include nanoseconds. If you do, and a datetime attribute does not include them, they are filled with zeroes.
Examples:
The example that follows relies on the list of predefined formats to convert your timestamps to the ISO format.
SELECT *,
datetime_to_iso(get(<datetime_attribute>, 0),
array())
AS <iso_formatted_timestamp>
The example that follows explicitly specifies two datetime formats to convert before any formats that match those specified in the predefined list. The format array is case-sensitive, so notice that the input dataset uses the 24-hour clock for times.
SELECT *,
datetime_to_iso(get(<datetime_attribute>, 0),
array('yyyy-MM-dd-:HH:mm:ss.S','yyyy-MM-dd :HH:mm:ss.S'))
AS <iso_formatted_timestamp>
For more information, see DATETIME_TO_ISO.
date_and_time_to_iso
date_and_time_to_iso
If your timestamp is separated into two different attributes for date and time, you can use this function to convert them to an ISO 8601 timestamp in a single attribute. If you explicitly specify one or more optional formats to convert, you must include the colon :
separator even if there are no values for one of the input attributes.
Examples:
The example that follows relies on the list of predefined formats to convert values to the ISO format.
SELECT *,
date_and_time_to_iso(get(<date_attribute>, 0), get(<time_attribute>,0),
array())
AS <iso_formatted_timestamp>
The example that follows explicitly specifies a date format to convert before any formats that match those supplied in the predefined list. In this example, the input dataset uses the 12-hour clock for times.
SELECT *,
date_and_time_to_iso(get(<date_attribute>, 0), get(<time_attribute>,0),
array('dd MM yyyy:', 'dd/MM/yyyy:hh:mm a'))
AS <iso_formatted_timestamp>
For more information, see DATE_AND_TIME_TO_ISO.
Predefined Date Formats
Tamr Core automatically converts values in the following formats to ISO 8601 format. If your timestamps use a format that does not appear on this list, or if formats should be converted in a different sequence than listed below, you should specify that format in the array()
.
For example, if your timestamps include month, day, and year values in the format MM dd yyyy
, be sure to specify it. That format is not included in the predefined list.
Tip: To find any records that have a timestamp value in a format that is not listed here, you can preview the effect of using the function with an empty array()
first, then filter to records that result in a null to explore the format of the original timestamp. You can then explicitly include that format in the array.
Only the year is specified:
datetime_to_iso, time is empty | date_and_time_to_iso, time is empty |
---|---|
'yy' | 'yy:' |
'yyyy' | 'yyyy:' |
'-yyyy' | '-yyyy:' |
'----yyyy' | '----yyyy:' |
'-----yy' | '-----yy:' |
'-----yyyy' | '-----yyyy:' |
'-------yyyy' | '-------yyyy:' |
'------yyyy' | '------yyyy:' |
Only the month and year are specified:
datetime_to_iso, time is empty | date_and_time_to_iso, time is empty |
---|---|
'--MMMyyyy' | '--MMMyyyy:' |
'--MMMyy' | '--MMMyy:' |
'-MMMyyyy' | '-MMMyyyy: |
'-MMMyy' | '-MMMyy:' |
'MMMyyyy' | 'MMMyyyy:' |
'yyyyMMM' | 'yyyyMMM:' |
'yyyy-MMM' | 'yyyy-MMM:' |
'yyyy-MM' | 'yyyy-MM:' |
Only the day, month and year are specified:
This table presents the formats that each function attempts to convert, in the sequence attempted. Blank cells indicate differences between the lists the functions use to convert day, month, and year values without a time.
datetime_to_iso, time is empty | date_and_time_to_iso, time is empty |
---|---|
'dd-MMM-yy' | 'dd-MMM-yy:' |
'd-MMM-yy' | 'd-MMM-yy:' |
'd-MM-yyyy' | 'd-MM-yyyy:' |
'dd-MM-yyyy' | 'dd-MM-yyyy:' |
'dd-MMM-yy' | 'dd-MMM-yy:' |
'ddMMMyy' | 'ddMMMyy:' |
'ddMMMyyyy' | 'ddMMMyyyy:' |
'dMMMyy' | 'dMMMyy:' |
'dMMMyyyy' | 'dMMMyyyy:' |
'yyyy-MM-d' | 'yyyy-MM-d:' |
'yyyy-MM-dd' | 'yyyy-MM-dd:' |
'yyyyMMd' | 'yyyyMMd:' |
'yyyyMMdd' | 'yyyyMMdd:' |
'd MM yy' | |
'dd MM yyyy' | 'dd MM yyyy:' |
'd MM yyyy' | 'd MM yyyy:' |
'dMMMyy' | 'dMMMyy:' |
'dMMMyyyy' | 'dMMMyyyy:' |
'ddMMMyy' | 'ddMMMyy:' |
'ddMMMyyyy' | 'ddMMMyyyy:' |
'dd MMM yy' | |
'dd MMM yyyy' | |
'dd-MMM-yy' | |
'dd-MMM-yyy' | |
'd-MMM-yy' | |
'd-MMM-yyyy’ |
Only the year and time are specified:
- 'yyyy:HH:mm:ss'
- 'yy:HH:mm:ss'
Only the month, year and time are specified:
- 'yyyyMMM:HH:mm:ss:'
- '--MMMyyyy:HH:mm:ss'
- 'MMMyyyy:HH:mm:ss'
The day, month, year, and time are specified:
- 'yyyy-MM-d:HH:mm:ss'
- 'yyyy-MM-dd:HH:mm:ss'
- 'yyyyMMM:HH:mm:ss'
- 'd/M/yyyy hh:mm:ss a'
- 'dd/MM/yyyy hh:mm:ss a'
- 'yyyy-MM-dd HH:mm:ss.S:'
- 'yyyy-MM-dd-HH:mm:ss'
Example Date Formats
In this example, the 4th of July 1990 is represented in a source dataset in the following ways. The syntax to use to explicitly specify each format follows for each one.
Data | Format |
---|---|
07/04/1990 | MM/dd/yyyy |
07-04-1990 | MM-dd-yyyy |
7/4/90 | M/d/yy |
4 JUL 90 | d MMM yy |
Working with Dates in ISO 8601 Format
Other functions can act on strings that are already in ISO 8601 format. These are:
iso_datetime_add
iso_datetime_add
Add or subtract time from a date. Valid units are: SEC
(seconds), MIN
(minutes), HRS
(hours), DAY
(days), WKS
(weeks), YRS
(years), and MTH
(months).
Note: Durations of months and years must be whole numbers, since months and years, such as leap years, can vary in length.
For example, to add four hours to a date:
iso_datetime_add(<date_attribute>, 4.0, 'hrs') AS <date_attribute>
To get the day before:
iso_datetime_add(<date_attribute>, -1.0, 'day') AS <date_attribute>
iso_difference_days
iso_difference_days
Get the number of days between two dates in ISO 8601 format. Returns a negative number if <date1_attribute>
is the later date of the two.
iso_difference_days(<date1_attribute>,<date2_attribute>) AS <duration_attribute>
duration_to_iso
duration_to_iso
Convert a number to ISO 8601 duration format. Valid units are SEC
(seconds), MIN
(minutes), HRS
(hours), DAY
(days), and WKS
(weeks).
duration_to_iso(<duration_attribute>, 'DAY', true) AS <duration_attribute>
dayofmonth
dayofmonth
Calaculate the day of month (1 to 31) of a date in ISO 8601 format.
dayofmonth(<date_attribute>) AS <date_attribute>
Dealing with Two-Digit Years
A known issue is that the dayofmonth
function inherits an issue from Java 8: any two-digit year passed to the function is treated as a year in the range 2000-2099. If your data includes dates prior to 2000, Tamr recommends this workaround.
In the following example, Date
is the name of the column or attribute that has the date string being converted.
- Convert the
Date
to the ISO 8601 standard using the functiondatetime_to_iso
. - Next, use a
case
expression to detect any year into the future, such as 2080, and if the year is greater than 2030, convert it to a year in the past, or a 19XX year, such as 1980.
In this example, the year 2030 is used as as a cutoff point, but you can change it to another year. You can also use the current year as the cutoff point by using the function runtime_date.
// This SELECT converts Date into the ISO standard. You need to match the array formats to match your input data.
SELECT *, datetime_to_iso(Date, array('dd-MMM-yy')) AS my_iso_date;
// This CASE expression checks the year of my_iso_date and converts from a 20xx year to a 19xx year if the year is greater than 2030.
// The year 2030 is an example and should be updated for your use case.
SELECT *,
CASE WHEN year(my_iso_date) > 2030
THEN substr2(iso_datetime_add(my_iso_date || 'T:00:00', -100.0, 'YRS'), 0, -9)
ELSE my_iso_date
END AS my_iso_date;
Working with Unix Time
In cases where a timestamp is in the form of seconds (or milliseconds) from epoch, you can use the iso_datetime_add
function to convert to ISO format. For example, if your data contains a timestamp in seconds from epoch in a date
column you can use the following transformation to convert it to ISO format and place it in the iso_date
column:
SELECT *, iso_datetime_add('1970-01-01T00:00:00', date, 'sec') as iso_date;
This example assumes a timezone of UTC for the input date.
See Unix time.
Updated over 2 years ago