Remove duplicate notes from Joplin

When I move from the import folder of Evernote to my target folders in Joplin it creates to many duplicates files.For resolve this I made approach near-duplicate detection with the tool https://github.com/pmandera/duometer

I run the tool with ./duometer -t 0.6 -i /path/to/my/Notes/Joplin -o relatorio (I sync with file System)

I use 0.6 for similarity, see documentation https://www.pawelmandera.com/2015/11/06/en-duometer-tutorial/

Now run the bash script for some important rules.

 #!/bin/bash  
   
 # For calculate the elapsed time in seconds  
 start=`date +%s`  
   
   
 n1=0 # count number the lines in file  
 n2=0 # count number the lines after the first IF  
 n3=0 # Count number the lines after second IF  
 deleted1=0 # count number deleted files with name in FILENAME  
 deleted2=0 # count number deleted files with name in FILENAME2  
   
 #read report file of duometer  
 while IFS=$'\t' read -r first second nonimportant ; do  
 FILENAME="${first}";  
 FILENAME2="${second}";  
 n1=$(($n1 + 1));  
   
 # verify If file exists, is not empty and are not pointing to the same file  
 if [ -n "$FILENAME" ] && [ -n "$FILENAME2" ] && [ -f "$FILENAME" ] && [ -f "$FILENAME2" ] && [ "$FILENAME" != "$FILENAME2" ] ; then  
 FILESIZE=$(stat -c%s "$FILENAME");  
 #FILEDATE=$(stat -c %y "$FILENAME");  
 FILESIZE2=$(stat -c%s "$FILENAME2");  
 #FILEDATE2=$(stat -c%y "$FILENAME2");  
 #echo "$FILENAME\n $FILENAME2"  
 #break;  
   
 # read content of first line of two similarity Notes for compare the title is same  
 content=$(  
  sed '  
   s/[[:space:]]\{1,\}/ /g; # turn sequences of spacing characters into one SPC  
   s/[^[:print:]]//g; # remove non-printable characters  
   s/^ //; s/ $//; # remove leading and trailing space  
   q; # quit after first line' < "$FILENAME"  
 )  
   
 content2=$(  
  sed '  
   s/[[:space:]]\{1,\}/ /g; # turn sequences of spacing characters into one SPC  
   s/[^[:print:]]//g; # remove non-printable characters  
   s/^ //; s/ $//; # remove leading and trailing space  
   q; # quit after first line' < "$FILENAME2"  
 )  
   
 n2=$(($n2 + 1));  
   
 # compare the contents  
 if [ "$content" = "$content2" ] && [ -n "$content" ] && [ -n "$content2" ]; then  
 n3=$(($n3 + 1));  
   
 # delete the note is lower size  
 if (( $FILESIZE2 >= $FILESIZE )); then  
   rm -rf $FILENAME;  
   apaga1=$(($apaga1 + 1));  
 else  
  rm -rf $FILENAME2;  
  apaga2=$(($apaga2 + 1));  
 fi  
 fi  
 fi  
   
 # echo "$FILEDATE <-> $FILESIZE <<||>>$FILEDATE2 <-> $FILESIZE2 \n";  
 done < ./relatorio; # this is the name (relatorio) of report duometer  
   
   
 # For calculate the elapsed time in seconds  
 end=`date +%s`  
   
echo "n1: $n1 <-> n2: $n2 <-> n3: $n3 <-> deleted1: $deleted1 <-> deleted2: $deleted2 <-> Time(S): $((end-start))";

My results (I never lost a note:)):


n1: 265835 <-> n2: 98362 <-> n3: 1126 <-> deleted1: 1122 <-> deleted2: 4 <-> Time(S): 1070

NOTE: I am not liable for any loss if using this method and apologies for my English.

Remove duplicate notes from Joplin

Enviar um comentário

0 Comentários

Declaração de Cookies

Tags

Categories

Pesquisar neste blogue

Denunciar abuso

configurar o teclado português (Portugal) no servidor xrdp do Ubuntu

Enfrentar a Diabetes na Escola: Estratégias e Desafios no Início do Ano Lectivo em Portugal

Páginas

Seguidores

Contact form

Remove duplicate notes from Joplin

Enviar um comentário

0 Comentários

Social Plugin

Declaração de Cookies

Tags

Categories

Pesquisar neste blogue

Denunciar abuso

configurar o teclado português (Portugal) no servidor xrdp do Ubuntu

Enfrentar a Diabetes na Escola: Estratégias e Desafios no Início do Ano Lectivo em Portugal

Páginas

Seguidores

Contact form