Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Have you reconsidered a computer career?


computers / comp.misc / archiving twitter

SubjectAuthor
* archiving twitterEli the Bearded
`* Re: archiving twitterSpiros Bousbouras
 `* Re: archiving twitterEli the Bearded
  `- Re: archiving twitterComputer Nerd Kev

1
archiving twitter

<eli$2211192053@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2118&group=comp.misc#2118

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.misc
Subject: archiving twitter
Date: Sun, 20 Nov 2022 04:09:27 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2211192053@qaz.wtf>
Injection-Date: Sun, 20 Nov 2022 04:09:27 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="1798"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Sun, 20 Nov 2022 04:09 UTC

This is not a super polished method (set of methods), but will likely
help people out.

You can download an archive of your own account easily with Twitter's
own tools. People are reporting that it takes about 48 hours from
request to completion.

The completed archive is a ZIP file intended to work as a web page in a
browser. I have not actually tried that, but unzipped it and started to
use the files inside.

In the zip there's an assets/ directory with stuff to support the "as a
web page" view, including, apparently, PNG files for every emoji.
There's also a data/ directory that is personal to your account.

Of note in the data directory:

All your Tweets in JSON:
data/tweets.js

All the images & video for your tweets (includes retweets):
data/tweets_media/

All your Direct Messages in JSON:
data/direct-messages.js

All the images & video for your messages:
data/direct_messages_media/

List of accounts following you:
data/follower.js

List of accounts you follow:
data/following.js

List of tweets you have liked:
data/like.js

Gotchas / warnings / limitaions:

1. There *does not seem* to be a list of your bookmarks.

2. The archive does not contain the alt text you may have put on images.
(Alt text was limited to 1500 characters instead of 280, so it was
handy sometimes for squeezing more text into a tweet, even if
partially hidden.)

3. Images in the media folders might not be the largest size twitter has
for your account.

4. Some JSON files have both Twitter short links (https://t.co/...) and
expanded URLs, while some just have the short links.

For point 2: there's an archiver tool here from people who do alt-text
type stuff in general:

https://archive.alt-text.org/
https://github.com/alt-text-org/tweet-alt-archive

For point 3: There's a tool here you can run to get full size images:

https://github.com/timhutton/twitter-archive-parser

For point 4: I've looped over mine with a simple shell script. Basically

# GNU grep has -o to only include part of line that matches
for link in $( grep -h -o 'https*://t.co/[a-zA-Z0-9]*' \
data/tweets.js data/like.js |
sort -u ); do
printf "\n%s: " "$link"
curl -w '%{redirect_url}' -o /dev/null -s "$link"
sleep 5
done > expanded-tco.links

For point 1: I haven't found anything except manual work yet to get the
bookmarks.

"Okay, GREAT!" you say, "But what about archiving stuff that is not in
my account? Like what if I want to save my liked tweets with images and
video? Or tweets I've posted to Usenet over the years? Or someone else's
account's public tweets?"

Here's a list of tools the data hoarders of Reddit have collected:
https://www.reddit.com/r/DataHoarder/comments/yy7tig/backup_twitter_now_multiple_critical_infra_teams/

Personally I like Social Network Scraper, snscrape, from that list. It's
python3 and in pip:

$ sudo apt-get install python3-pip # eg for Ubuntu
$ pip3 install snscrape

Take care that *where* pip installs it is on your $PATH, and then
you are ready to go. The usage example for snscrape is a bit vague.
I've found there are two useful modes: entire account and single tweet.

$ account=NanoRaptor
$ snscrape --jsonl twitter-user $account > $account.json

Verify $account.json looks good (for some accounts I'm not getting
much) then extract media URLs:

$ jq -r '.media[] | .fullUrl' $account.json 2>/dev/null > image.links
$ jq -r '.media[] | .variants[] | .url' \
$account.json 2>/dev/null > video.links

Use 2>/dev/null because you'll get a ton of "Cannot iterate over null"
errors for tweets without images or video. The video.links will include
a lot of alternatives for some tweets, and just a single one for
others. I don't have a good way of picking "best" automatically.

For the single tweet mode, I've been using snscrape like this:

# links.txt is a list of URLs one per line, like
# https://twitter.com/Uriji1/status/1398430745035747336

$ for id in $( rev links.txt | cut -f 1 -d / | rev ) ; do
# you'll get a Traceback stackdump for deleted
# links or deleted accounts
snscrape --jsonl twitter-tweet $id > $id.json

jq -r '.media[] | .fullUrl' $id.json >> image.links 2>/dev/null
jq -r '.media[] | .variants[] | .url' $id.json >> video.links 2>/dev/null
done

Download the images. The links look like:

# source tweet:
# https://twitter.com/Uriji1/status/1398430745035747336
https://pbs.twimg.com/media/E2g5AncXEAQdqcP?format=jpg&name=large
https://pbs.twimg.com/media/E2g5C_3WUAEBipg?format=jpg&name=large
https://pbs.twimg.com/media/E2g5EMbWYAQE6ha?format=jpg&name=large

This finds a suffix and isolates the ID of the file.

$ for line in $( sort -u image.links ) ; do
case "$line" in
*format=jpg*) suf=jpg ;;
*format=png*) suf=png ;;
*) suf=other ;;
esac

burl=${line%?format=*} # ${variable%GLOB} remove from end
id=${burl#*/media/} # ${variable#GLOB} remove from start

curl -o "$id.$suf" "$line"
done

Download the simple case videos.

# source tweet:
# https://twitter.com/silentmoviegifs/status/1517383816884727809
https://video.twimg.com/tweet_video/FQ7UL8wXwAACEGL.mp4

$ for line in $( grep /tweet_video/ video.links | sort -u) ; do
curl -O "$line"
done

Hard case ones look like this, multiple variants for the same file:

# source tweet:
# https://twitter.com/AppleIIBot/status/1588678248023871489

https://video.twimg.com/ext_tw_video/1588678211571154944/pu/vid/364x270/jid0Xz9s7x4J79mH.mp4?tag=12
https://video.twimg.com/ext_tw_video/1588678211571154944/pu/vid/850x630/yYoWWRnA3mXgW0oo.mp4?tag=12
https://video.twimg.com/ext_tw_video/1588678211571154944/pu/pl/ttqk_5h8PGDIB0IW.m3u8?tag=12&container=fmp4
https://video.twimg.com/ext_tw_video/1588678211571154944/pu/vid/484x360/4jU3htO-XaNrR9y7.mp4?tag=12

I haven't started to deal with those yet. I suspect the /vid/WWWxHHH/
format will be the easiest to deal with, and select largest width by
height for a given /ext_tw_video/IDNUMBER/ .

Happy archiving, and share tips you may have found.

Elijah
------
has 1.5G in ~/twitter/ so far

Re: archiving twitter

<rJicESMyzmhMWogxR@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2119&group=comp.misc#2119

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.misc
Subject: Re: archiving twitter
Date: Sun, 20 Nov 2022 05:47:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <rJicESMyzmhMWogxR@bongo-ra.co>
References: <eli$2211192053@qaz.wtf>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 20 Nov 2022 05:47:53 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="02a55de305f1690b61c64e3e3e763235";
logging-data="3642160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/uVGVORZNaVa/BHY3DiL4"
Cancel-Lock: sha1:DTSTv2aMZkP1ns5BIDh0V7sIVos=
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
In-Reply-To: <eli$2211192053@qaz.wtf>
 by: Spiros Bousbouras - Sun, 20 Nov 2022 05:47 UTC

On Sun, 20 Nov 2022 04:09:27 -0000 (UTC)
Eli the Bearded <*@eli.users.panix.com> wrote:
> This is not a super polished method (set of methods), but will likely
> help people out.
>
> You can download an archive of your own account easily with Twitter's
> own tools. People are reporting that it takes about 48 hours from
> request to completion.

You mean it takes 48 hours regardless of how much one has on their account ?
Whether one has 1 tweet or thousands of them ?

> The completed archive is a ZIP file intended to work as a web page in a
> browser. I have not actually tried that, but unzipped it and started to
> use the files inside.

Is there a way for someone to progressively get newer stuff or is the only
way for someone who wants a complete personal copy to redownload everything
from scratch every now and again ?

--
vlaho.ninja/prog

Re: archiving twitter

<eli$2211200354@qaz.wtf>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2120&group=comp.misc#2120

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: *@eli.users.panix.com (Eli the Bearded)
Newsgroups: comp.misc
Subject: Re: archiving twitter
Date: Sun, 20 Nov 2022 08:54:37 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2211200354@qaz.wtf>
References: <eli$2211192053@qaz.wtf> <rJicESMyzmhMWogxR@bongo-ra.co>
Injection-Date: Sun, 20 Nov 2022 08:54:37 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="panix5.panix.com:166.84.1.5";
logging-data="27454"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
 by: Eli the Bearded - Sun, 20 Nov 2022 08:54 UTC

In comp.misc, Spiros Bousbouras <spibou@gmail.com> wrote:
> You mean it takes 48 hours regardless of how much one has on their
> account ? Whether one has 1 tweet or thousands of them ?

Apparently. Not sure if deliberate design, or just a lo g queue with few
resources devoted to the queue.

> Is there a way for someone to progressively get newer stuff or is the
> only way for someone who wants a complete personal copy to redownload
> everything from scratch every now and again ?

No progressive updates. And I think one request for full backup per
week.

I strongly believe Twitter is on borrowed time and will be collapsing
soon. Musk is bringing Trump back, World Cup is coming, legal challenges
will nasty soon (automatic copyright violation moderation has stopped
working, so DCMA requests will surge).

I don't believe it will make it to 2023, and getting to December is
iffy.

Weekly updates are a minor concern.

Elijah
------
unsure how well Twitter will even doneith end of year tax filings for ex-staff

Re: archiving twitter

<637a9505@news.ausics.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2121&group=comp.misc#2121

  copy link   Newsgroups: comp.misc
Message-ID: <637a9505@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: archiving twitter
Newsgroups: comp.misc
References: <eli$2211192053@qaz.wtf> <rJicESMyzmhMWogxR@bongo-ra.co> <eli$2211200354@qaz.wtf>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 21 Nov 2022 06:58:46 +1000
Organization: Ausics - https://www.ausics.net
Lines: 18
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: Computer Nerd Kev - Sun, 20 Nov 2022 20:58 UTC

Eli the Bearded <*@eli.users.panix.com> wrote:
>
> I strongly believe Twitter is on borrowed time and will be collapsing
> soon. Musk is bringing Trump back, World Cup is coming, legal challenges
> will nasty soon (automatic copyright violation moderation has stopped
> working, so DCMA requests will surge).
>
> I don't believe it will make it to 2023, and getting to December is
> iffy.

From what I hear it sounds like he's making Twitter somewhat more
similar to Usenet so far as policies go, so in principle I can't
object. But they alienated me years ago when their webpages stopped
displaying in Dillo anyway (not that I ever viewed it often).

--
__ __
#_ < |\| |< _#

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor