Topic on Project:Support desk

Delete unused images

11
Zackmann08 (talkcontribs)

Hello, I am trying to delete unused images from my wiki. I found the following code on a forum somewhere but I cannot get it to work:

lynx -dump http://wiki_site/wiki/index.php/Special:UnusedFiles | grep "/File:" | gawk '{print $2}' | sort | uniq | gawk -F\File '{print "File"$2}' > ./delete_files
php /var/www/html/wiki/maintenance/deleteBatch.php ./delete_files

It keeps saying that "lynx" and "gawk" commands were nto found. I get how to use the deleteBatch.php script. What I really need is an easy way to get a list of all the files that show up in "Special:UnusedFiles".

Any help is appreciated.

Allen4names (talkcontribs)

I would try something simpler. For example I would start with the following.

wget -q -O - http://localhost/wiki/index.php/Special:UnusedFiles | grep "/File:" - > ./delete_files && gedit ./delete_files

Then use gedit to clean things up before using the deleteBatch.php script. I think that "gawk '{print $2}'" should be "gawk '{print $2}' -- -", and "sort | uniq" should be "sort -u -" etc. but I don't use the gawk command very much so you need someone more familiar with it to help you.

Zackmann08 (talkcontribs)

First off, thank you very much for the help! I was actually able to get SOMETHING to run using that. The file (./delete_files) is turning up with tons of extra information though. Is there any easy way to isolate the file names? For example, in the following line (which represents a single file):

<div class="thumb" style="width: 150px;"><div style="margin:15px auto;"><a href="/wiki/File:WIP.jpg" class="image"><img alt="WIP.jpg" src="/images/thumb/b/b0/WIP.jpg/119px-WIP.jpg" width="119" height="120" /></a></div></div>
<a href="/wiki/File:WIP.jpg" title="File:WIP.jpg">WIP.jpg</a><br />
<pre>

I would like to just have:
<pre>
File:WIP.jpg

Any quick and easy way to do that?

Allen4names (talkcontribs)

You could try the following.

cp -n delete_files delete_files.bak | replace 'href=\"/wiki/' '
' '\" class' '
' -- delete_files && cat delete_files | grep 'File' - | sort -u - > delete_files_sorted

Copy and paste this as a single line. Please let me know if this works.

Zackmann08 (talkcontribs)

Not quite... I get the following. (I limited it to only include 2 of the many files listed)

delete_files:

			<div class="thumb" style="width: 150px;"><div style="margin:34.5px auto;"><a File:ISR.jpg="image"><img alt="ISR.jpg" src="/images/thumb/2/2e/ISR.jpg/120px-ISR.jpg" width="120" height="81" /></a></div></div>
<a File:ISR.jpg" title="File:ISR.jpg">ISR.jpg</a><br />
			<div class="thumb" style="width: 150px;"><div style="margin:36.5px auto;"><a File:ITA.jpg="image"><img alt="ITA.jpg" src="/images/thumb/1/1f/ITA.jpg/120px-ITA.jpg" width="120" height="77" /></a></div></div>
<a File:ITA.jpg" title="File:ITA.jpg">ITA.jpg</a><br />

delete_files_sorted:

<a File:ISR.jpg" title="File:ISR.jpg">ISR.jpg</a><br />
<a File:ITA.jpg" title="File:ITA.jpg">ITA.jpg</a><br />
			<div class="thumb" style="width: 150px;"><div style="margin:34.5px auto;"><a File:ISR.jpg="image"><img alt="ISR.jpg" src="/images/thumb/2/2e/ISR.jpg/120px-ISR.jpg" width="120" height="81" /></a></div></div>
			<div class="thumb" style="width: 150px;"><div style="margin:36.5px auto;"><a File:ITA.jpg="image"><img alt="ITA.jpg" src="/images/thumb/1/1f/ITA.jpg/120px-ITA.jpg" width="120" height="77" /></a></div></div>

Thank you SO MUCH for your help!

Allen4names (talkcontribs)

By "as a single line" I meant that you should enter the code so that it looks like the three lines below.

$ cp -n delete_files delete_files.bak | replace 'href=\"/wiki/' '
> ' '\" class' '
> ' -- delete_files && cat delete_files | grep 'File' - | sort -u - > delete_files_sorted

I am sorry if my mistake caused you to missunderstand what I meant.

Zackmann08 (talkcontribs)

Once again, let me just thank you for taking so much time to help me with this. I did missunderstand you but that was my bad. It is ALMOST there.... now what I am getting is:

File:ISR.jpg
File:ISR.jpg" title="File:ISR.jpg">ISR.jpg</a><br />
File:ITA.jpg
File:ITA.jpg" title="File:ITA.jpg">ITA.jpg</a><br />

Cannot thank you enough...

Allen4names (talkcontribs)

I think this should do it.

cp -n delete_files delete_files.bak | replace '/' '
' '\"' '
' -- delete_files && cat delete_files | grep 'File' - | sort -u - > delete_files_sorted

Here is something else you could try as a batch script. I have commented out one of the lines for testing.

#!/bin/bash
cd ~
deleteFiles=$(wget -q -O - http://localhost/wiki/index.php/Special:UnusedFiles | replace '/' '
' '\"' '
' | grep 'File' - | sort -u -)
if [ "$deleteFiles" = '' ]; then
exit 1
fi
#php /var/www/html/wiki/maintenance/deleteBatch.php "$deleteFiles"
exit 0

This script should be run from the command line for testing. Hopefully this will work for you.

Zackmann08 (talkcontribs)

You, sir, are a god amongst men. THANK YOU SO MUCH!!!!! Worked perfectly. :-)

193.190.210.16 (talkcontribs)

I know this thread dates back from 2012, but in case it helps others, I suggest a very minor fix to it:

grep 'File:' -

instead of

grep 'File' -

which could otherwise catch some unrelated garbage on the page (which in turn would probably get discarded by deleteBatch.php, but "better safe than sorry")

Krinkle (talkcontribs)
Reply to "Delete unused images"