Problem description & analysis
Below is CSV file emp.csv:
EMPID,FIRSTNAME,LASTNAME,EMAIL,UPDATEDATE
123,John,Smith,john.smith01@email.com,01/01/2020
234,Bruce,Waye,bruce.wayne@wayneenterprises.com,02/02/2020
123,John,Smith,john.smith02@email.com,02/15/2020
345,Clark,Kent,clark.kent@dailyplanet.com,02/16/2020
123,John,Smith,john.smith03@email.com,02/20/2020
In the above file, UPADATEDATE field is ordered and EMPID field contains duplicates.
We are trying to divide records in the CSV file into two parts:
Write the newest employee records, which are those with the latest UPDATEDATE values after the file is grouped by EMPID, to EMP table;
Write the rest of the records, which are the difference, to EMP_HIS table according to the original order.
Solution
Write the following script p1.dfx in esProc:
A1 Import data from the CSV file, during which the first row is read as column headers and UPDATEDATE is parsed into date type (while the original date format is MM/dd/yyyy).
A2 Group A1’s table by EMPID and get the last member from each group (the record with the latest UPDATEDATE value).
A3 Get the rest of the records, which is the difference of A1 and A2.
A4 Connect to the database.
A5 Update the latest employee record to the database by EMPID or insert it to EMP table.
A6 Update the historical employee records by EMPID and UPDATEDATE to the database, or insert them to EMP_HIS table.
A7 Close database connection.
Find how to integrate the script code with a Java program in How to Call an SPL Script in Java.
Top comments (0)