CSV2Array / StringSplit soll Text innerhalb von "" ignorieren

  • Hallo!

    Ich erstelle momentan ein Script, dass mir als Endziel mal erstellte Adresslisten aus unserer Warenwirtschaft automatisch in unser Mail-System exportiert, um dann dort personalisierte Mailings erstellen zu können.

    Dazu wird die Adressliste als *.csv exportiert. Mein Script importiert dann diese Liste, zieht sich die wichtigen Infos wie Mail-Adresse, Anrede etc aus der csv raus und schreibt danach eine Import-Datei fürs Mail-System.

    Dass ganze funktioniert auch gut...mit einer Ausnahme:
    Wenn in der Warenwirtschaft der in einem Adress-Satz irgendwo ein Komma gesetzt ist (z.B. im Firmenname), haut mein Import-Script nicht mehr hin, da leider in der csv die einzelnen Felder auch mit Kommas getrennt werden. Damit verschiebt sich logischerweise die Reihenfolge im Ziel-Array und ich habe einen ungültigen Adress-Satz.

    Ursprünglich habe ich die _CSV2Array-Funktion ( http://www.thorsten-willert.de/Themen/AutoIt/_CSV2Array.au3?a ) benutzt.

    Dann habe ich angefangen, mir eine eigene Funktion zum einlesen der csv in ein Array zu schreiben - bin dann aber auf das gleiche Problem gestossen...wie bringe ich StringSplit bei, Trennzeichen innerhalb von "" zu ignorieren?

    Oder ist der Weg über Stringsplit der falsche?

    Danke für einen Denkanstoss :)

    Einmal editiert, zuletzt von Winnie (17. Februar 2010 um 13:06)

  • Meine Idee wäre es, alle Kommas innerhalb von Strings per StringRegExpReplace durch [KOMMA] zu ersetzen. Nachden der Array eingelesen wurde, wird das ganze wieder zurückersetzt ( da geht dann auch normales StringReplace ).

    Am besten wäre es, du postets mal eine Muster CSV-Datei, damit ich mal einen Regulären Ausdruck dafür schreiben kann.
    Es muss keine Orginal - CSV mit echten Daten sein, aber das Muster sollte das gleiche sein.

    LG SEuBo

  • Wenn ich deine Erklärung richtig verstanden habe, sollte folgender StringRegExp helfen:

    [autoit]


    $string = '"Test, GmbH Test",Musterstraße 1,01234 Musterhausen'

    [/autoit][autoit][/autoit][autoit]

    $split = StringRegExp($string, '((?:")(?:.+?)(?:")|(?:.+?))(?:,|$)' , 3)
    MsgBox(0, "Treffer:", $split[0] & ";" & $split[1] & ";" & $split[2])

    [/autoit]

    Es wird an jedemKomma gesplittet, das nicht innerhalb von Anführungszeichen "" liegt.

  • Hmm - erstmal danke für deine Idee....leider sehe ich noch nicht ganz der Vorteil, erst eine Zeile der CSV einzulesen, dann alle Kommas durch einen anderen Ausdruck zu ersetzen und später dann wieder alles zurückzusetzen...habe ich dann nich weiterhin das Problem, zwischen "guten" und "schlechten" Kommas zu differenzieren?

    Ach moment, jetzt beim Schreiben der obigen Zeilen verstehe ich glaub ich, wie du es meinst:
    Mit StringRegExpReplace NUR die Kommas innerhalb der "" ersetzen?

    Anyway, hier mal die Beispiel-CSV:

    [autoit]

    absbank,absblz,absfax,abskonto,absort,absplz,abstele,ad01,ad02,ad03,ad04,ad05,ad06,ad07,ad08,ad09,ad10,adr_http,adr_mobil,adr_pmobil,adr_telex,adzu2,adzu3,adzu4,anrede,ans_abt,ans_email,ans_fax,ans_firma0,ans_firma1,ans_firma2,ans_funk,ans_http,ans_land,ans_mobil,ans_ort,ans_pemail,ans_pfax,ans_plz_st,ans_pmobil,ans_postf,ans_ptele,ans_strass,ans_tele,ans_telex,bank,barcode,blz,curtimest,date,datetime,email,fax,firma0,firma1,firma2,firma3,firma4,firmierung,gruppe,kto,landkurz,landlang,matchcode,nummer,ort,pemail,pfax,plz,postfach,strasse,telefon1,telefon2,vemail,vfax,wfkid,wfk_text,wfpid,wfp_termin,wfp_zeit
    "Deutsche Bank","1234567890","555-555555","0987654321","somewhere","12345","555-5555","Mega-Money-Company","Herrn","Long Avenue 222","12345 Town","","","","","","","www.mega.de","","","","","","","Herrn","Buchhaltung","bigboss@megacompany.de","555-55555",1,"Boss","Big","","www.megacompany.de","D","","","","","","","","","","555-55555","","Deutsche Bank","","123456789",29.01.2010 11:34:46,29.01.2010,29.01.2010 12:34:48,"bigbos@megacompany.de","555-5555","","Mega Company GbR","","","","Mega Company GbR","KUN","12345","","","MEGACOMP",13117,"Town","","","12345","","Long Street 123","555-5555","","bigboss@megacompany.com","555-55555",0,"",0,. .,""
    "Deutsche Bank","1234567890","555-555555","0987654321","somewhere","12345","555-5555","Mega-Money-Company","Herrn","Long Avenue 222","12345 Town","","","","","","","www.mega.de","","","","","","","Herrn","Buchhaltung","bigboss@megacompany.de","555-55555",1,"Boss","Big","","www.megacompany.de","D","","","","","","","","","","555-55555","","Deutsche Bank","","123456789",29.01.2010 11:34:46,29.01.2010,29.01.2010 12:34:48,"bigbos@megacompany.de","555-5555","","Mega Company GbR","","","","Mega Company GbR","KUN","12345","","","MEGACOMP",13117,"Town","","","12345","","Long Street 123","555-5555","","bigboss@megacompany.com","555-55555",0,"",0,. .,""

    [/autoit]

    Danke schonmal!

  • Schau mal hier:

    (Ich hab noch an ein paar zufälligen Stellen kommas hingemacht, nicht wundern ;) )

    Spoiler anzeigen
    [autoit]

    $sString = 'absbank,absblz,absfax,abskonto,absort,absplz,abstele,ad01,ad02,ad03,ad04,ad05,ad06,ad07,ad08,ad09,ad10,adr_http,adr_mobil,adr_pmobil,adr_telex,adzu2,adzu3,adzu4,anrede,ans_abt,ans_email,ans_fax,ans_firma0,ans_firma1,ans_firma2,ans_funk,ans_http,ans_land,ans_mobil,ans_ort,ans_pemail,ans_pfax,ans_plz_st,ans_pmobil,ans_postf,ans_ptele,ans_strass,ans_tele,ans_telex,bank,barcode,blz,curtimest,date,datetime,email,fax,firma0,firma1,firma2,firma3,firma4,firmierung,gruppe,kto,landkurz,landlang,matchcode,nummer,ort,pemail,pfax,plz,postfach,strasse,telefon1,telefon2,vemail,vfax,wfkid,wfk_text,wfpid,wfp_termin,wfp_zeit' & @CRLF
    $sString &= '"Deutsche Bank","1234567890","555-555555","0987654321","somewhere","12345","555-5555","Mega-Money,Company","Herrn","Long Avenue 222","12345 Town","","","","","","","www.mega.de","","","","","","","Herrn","Buch,haltung","bigboss@megacompany.de","555-55555",1,"Boss","Big","","www.megacompany.de","D","","","","","","","","","","555-55555","","Deutsche Bank","","123456789",29.01.2010 11:34:46,29.01.2010,29.01.2010 12:34:48,"bigbos@megacompany.de","555-5555","","Mega Company GbR","","","","Mega Company GbR","KUN","12345","","","MEGACOMP",13117,"Town","","","12345","","Long Street 123","555-5555","","bigboss@megacompany.com","555-55555",0,"",0,. .,""' & @CRLF
    $sString &= '"Deutsc,he Bank","1234567890","555-555555","09876,54321","somewhere","12345","555-5555","Mega-Money-Company","Herrn","Long Avenue 222","12345 Town","","","","","","","www.mega.de","","","","","","","Herrn","Buchhaltung","bigboss@megacompany.de","555-55555",1,"Boss","Big","","www.mega,company.de","D","","","","","","","","","","555-55555","","Deutsche Bank","","123456789",29.01.2010 11:34:46,29.01.2010,29.01.2010 12:34:48,"bigbos@megacompany.de","555-5555","","Mega Company GbR","","","","Mega Company GbR","KUN","12345","","","MEGACOMP",13117,"Town","","","12345","","Long Street 123","555-5555","","bigboss@megacompany.com","555-55555",0,"",0,. .,""'

    [/autoit] [autoit][/autoit] [autoit][/autoit] [autoit]

    ConsoleWrite($sString & @CRLF)
    $sString = StringRegExpReplace($sString,'(?<=")([^,"]+?),([^,"]+?)(?=")',"\1[KOMMA]\2")
    ConsoleWrite(@CRLF)
    ConsoleWrite($sString & @CRLF)

    [/autoit]
  • Super-Sache, dass *freu*

    Ich werde das gleich mal in mein Script basteln und entprechend anpassen - ich denke, damit ist mein Problem gelöst...ich Danke dir!

  • Versteh immernoch nicht warum man was ersetzen soll? Hier mal mein Ansatz:
    (Die abgeänderte Funktion von Thorsten Willert)


    Spoiler anzeigen
    [autoit]


    #include <Array.au3>

    [/autoit] [autoit][/autoit] [autoit]

    $hFile = @ScriptDir & "\Test.csv"
    $split = _CSV2Array($hFile, ",", False)
    If @error then MsgBox(0, @error, "")
    _ArrayDisplay($split)

    [/autoit] [autoit][/autoit] [autoit]

    ; #FUNCTION# ===================================================================
    ; Name ..........: _CSV2Array
    ; Description ...:
    ; AutoIt Version : V3.3.0.0
    ; Syntax ........: _CSV2Array($hFile[, $cSeperator = "auto"[, $bFilterString = True[, $iColumnMode = 0]]])
    ; Parameter(s): .: $hFile - Handle for the CSV file to Read
    ; $cSeperator - Optional: (Default = "auto") : Tries to find the separator char ;) or , or TAB or | or space)
    ; | Data-seperator-char
    ; | Empty-string = Opt("GUIDataSeparatorChar")
    ; $bFilterString - Optional: (Default = True) : Removes leading and trailing " or '
    ; $iColumnMode - Optional: (Default = 0) :
    ; | 0: Sets error if lines have different columns and @extended to the csv-line number
    ; | 1: returns lines with different columns numbers comparing to the first line, too
    ; | 2: removing all columns > column numbers in the first line
    ; Return Value ..: Success - 2-dim Array
    ; Failure - 0
    ; @ERROR - 1: error file read
    ; @ERROR - 2: different number of columns / @EXTENDED = CSV-line
    ; - 3: parameter error
    ; Author(s) .....: Thorsten Willert
    ; Date ..........: Mon Dec 07 18:59:46 CET 2009
    ; ==============================================================================
    Func _CSV2Array($hFile, $cSeperator = "auto", $bFilterString = True, $iColumnMode = 0)
    Local $s = FileRead($hFile)
    If @error Then Return SetError(1)

    [/autoit] [autoit][/autoit] [autoit]

    If $cSeperator = Default Then $cSeperator = "auto"
    If Not $cSeperator Then $cSeperator = Opt("GUIDataSeparatorChar")

    [/autoit] [autoit][/autoit] [autoit]

    ; searching the line-seperator and splitting the lines into an array
    Local $aLines
    If StringInStr($s, @CRLF) Then
    $aLines = StringSplit($s, @CRLF, 1)
    ElseIf StringInStr($s, @CR) Then
    $aLines = StringSplit($s, @CR)
    Else
    $aLines = StringSplit($s, @LF)
    EndIf

    [/autoit] [autoit][/autoit] [autoit]

    ; searching the delimiter in the first line
    Local $aTMP
    If $cSeperator = "auto" Then
    Local $iMax = 0
    Local $iC[5] = [0, 0, 0, 0, 0]
    Local $sC[5] = [";", ",", @TAB, "|", " "]

    [/autoit] [autoit][/autoit] [autoit]

    $aTMP = StringRegExp($aLines[1], ";", 3)
    If Not @error Then $iC[0] = UBound($aTMP)
    $aTMP = StringRegExp($aLines[1], ",", 3)
    If Not @error Then $iC[1] = UBound($aTMP)
    $aTMP = StringRegExp($aLines[1], "\t", 3)
    If Not @error Then $iC[2] = UBound($aTMP)
    $aTMP = StringRegExp($aLines[1], "\|", 3)
    If Not @error Then $iC[3] = UBound($aTMP)
    $aTMP = StringRegExp($aLines[1], "[ ]", 3)
    If Not @error Then $iC[4] = UBound($aTMP)

    [/autoit] [autoit][/autoit] [autoit]

    For $i = 0 To UBound($sC) - 1
    If $iC[$i] > $iMax Then
    $iMax = $iC[$i]
    $cSeperator = $sC[$i]
    EndIf
    Next
    EndIf

    [/autoit] [autoit][/autoit] [autoit]

    ; creating 2-dim array based on the number of data in the first line
    $aTMP = StringRegExp($aLines[1], '((?:")(?:.*?)(?:")|(?:.+?))(?:' & $cSeperator & '|$)' , 3)
    Local $iCol = Ubound($aTMP)
    Local $aRet[$aLines[0]][$iCol]

    [/autoit] [autoit][/autoit] [autoit]

    ; splitting and filling the lines
    For $i = 1 To $aLines[0]
    $aTMP = StringRegExp($aLines[$i], '((?:")(?:.*?)(?:")|(?:.+?))(?:' & $cSeperator & '|$)' , 3)
    If @error Then ContinueLoop
    If $aTMP[0] > $iCol Then
    Switch $iColumnMode
    Case 0
    Return SetError(2, $i)
    Case 1
    ReDim $aRet[$aLines[0] - 1][$aTMP[0]]
    Case 2
    $aTMP[0] = $iCol
    Case Else
    Return SetError(3)
    EndSwitch
    EndIf
    For $j = 0 To Ubound($aTMP) - 1
    $aTMP[$j] = StringStripWS($aTMP[$j], 3)

    [/autoit] [autoit][/autoit] [autoit]

    ; removing leading and trailing " or '
    If $bFilterString Then $aTMP[$j] = StringRegExpReplace($aTMP[$j], '^("|'')(.*?)\1$', '$2')

    [/autoit] [autoit][/autoit] [autoit]

    $aRet[$i - 1][$j] = $aTMP[$j]
    Next ; /cols
    Next ; /lines

    [/autoit] [autoit][/autoit] [autoit]

    Return $aRet
    EndFunc ;==>_CSV2Array

    [/autoit]