Presentation is loading. Please wait.

Presentation is loading. Please wait.

WAS to Archive-It Metadata Migration March 11, 2015.

Similar presentations


Presentation on theme: "WAS to Archive-It Metadata Migration March 11, 2015."— Presentation transcript:

1 WAS to Archive-It Metadata Migration March 11, 2015

2 WAS -> Archive-It WAS Project/Archive  3 levels of hierarchy – Project – Site (can contain 1 or more Seed URLs) – Seed URL Archive-It Collection 2 levels of hierarchy – Collection – Seed URL

3 2 Seed URLs per Site 1 Seed URL per Site 2 Seed URLs per Site

4 Multiple seeds – flattens out; each Seed URL gets all the Site Metadata

5 BEFORE starting, you should… Delete sites (seeds) that you have never captured or you captured, but you deleted all the captures. Probably sitting under ‘never captured’ or ‘inactive sites’

6 How to move Move project (collection) by project (collection). When you sit down, start and finish the move of a project. You don’t have to do all projects/collections in one day

7 Run two reports (Administration > Project Admin) 1. Click “Archive-It Seed Export” > Export Seeds 2. Click “Archive-It Seed Metadata Export > export metadata Coming Soon in your accounts

8 Export Seeds

9 Seeds export from WAS It is in.txt format, open it with notepad Your seeds will be segmented by crawl frequency. E.g., “Seeds with custom schedule of 1x per year” You will copy and paste URLS from the.txt document and upload them in chunks by frequency

10 Example text file

11 Consult the WAS- Archive-it mapping document to decide on the equivalent frequency https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2 https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2

12 Create a Collection (project) in Archive-It

13 Create Collection (aka Project)

14 Select frequency: for now leave at “One-Time’, click next

15 Enter Collection level metadata. This metadata displays in the public site. You can go back and fully enter this later

16 Topics will appear in public site (along with any Subjects you have)

17 Example display

18 In order to create a collection, you must upload seeds.

19 If you have Historical seeds, Upload those FIRST(!) Historical sites/seeds are seeds where the seed URL has changed over the life of the captures. They will be at the top of your seeds.txt document Do these first because it is easiest to do a ‘bulk edit” and select ‘deactivate”

20 Example seeds list with Historical Seeds

21 Copy and paste seeds from.txt fie into box. Leave ‘Default’ selected > Next

22 VERY important: 1. Ignore this error for ALL your seed uploads. 2. “URL is correct; use as is” MUST be checked regardless of the error you see. If it is not selected for any seeds, go thru now and change it for all instances.

23 Another example, click: “URL is correct; use as if” for all

24 Collection created

25 Bulk Edit Historical Seeds (where applicable)

26 Under “Seed Management” click “All”

27 Click top box to select all. Note: you will ‘select all’ for what is displayed, if there are more than 400 items, they are on another page. You will have to repeat

28 Click “bulk edit”

29 Choose “Deactivate”

30 Go back to bulk edit > Add Metadata Suggestion: add a Notes field if you don’t already have one, where you note that these are historical seeds. Most likely will never want to crawl these again so you may want to keep track

31

32 Add a custom field

33 Go back Collection management and repeat for the next frequency in your seed list

34 Back to Seeds.txt file Leave as ‘one-time” they will not crawl until you say crawl now

35 Copy and paste seeds into box. Leave ‘Default’ selected > Next

36 For this case, choose Quarterly

37 Import metadata

38 Click “ALL seeds” > Import metadata

39 Upload the metadata file > Upload File (leave default setting)

40 You could stop here and do the clean up at a later day

41 Metadata cleanup If there is a WAS field that is not in Archive-it, on import Archive-it creates a custom field. All fields will display in the public interface by default The following fields may be in your upload, but they should ALL be made private: Note, Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID

42 How to make fields private in Archive-it: 1.Go to Admin (link in the upper right corner) 2.Account Settings 3.In the text box toward the bottom of the page called 'Private Metadata Fields' enter all these fields: Note, Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID 4.NB: Enter each field name on a separate line, in all lower case letters.

43

44

45 Scope –> Seed Type What about Directory only? What about Page only?

46 NB. Archive-it offers a lot of additional scoping options for crawls. View: Help Documentation (linked top, right of collection page)Help Documentation

47 Directory is not a separate scoping option in Archive-it ( it is handled through slash - /) NO action need by you, except to QA WAS – Directory crawls Rosalie.com/presentations – We will add the ending slash for you if you didn’t Rosalie.com/presentations/ – It moves over as is Rosalie.com/presentations.html – It will crawl as host

48 What about ‘page only’ crawls? For ‘Page only’ you will have to manually go back and change crawl scope (seed type) You can find these by opening the metadata export. It is in.ods format, which you can open in Google docs, with most versions of excel or download open office. Do NOT edit the.ods file before doing the metadata upload; make a copy. Then sort “scope” column to find the relevant URLs How to change it: Page: click on Settings > Crawl one page only (can also be bulk edited)

49 Change Frequency under Settings > Seed Type

50 When will my crawls start? When you start them.

51 When do I shut off WAS crawls? FIRST set up your crawls in Archive-It Make sure daily crawls are running Then you can stop your WAS crawls

52 VERY important: Do NOT make any edits to WAS data, crawls, ANYTHING once you have moved a project to Archive-It!

53 Batch shut off crawling in WAS

54 Sites > Manage Sites > “all” > “select all” > “Reschedule Selected”

55 Select “off” and click “Reschedule”

56 Send CDL your info

57 After you have created all your collections, 1.Send Rosalie this info for each collection a)Collectionid b)Accountid AND 2. Add Rosalie as a user to your account (for now)

58 CollectionId and AccountId in URL

59 Where’s my data? Archive-It will work with CDL staff to move over your data. Timeline: May/June 2015

60 Resources WAS – Archive-It Migration wiki: https://wiki.library.ucsf.edu/display/UCLCKG/WAS+-%3E+Archive-it+Migration https://wiki.library.ucsf.edu/display/UCLCKG/WAS+-%3E+Archive-it+Migration Mapping of terms and metadata: WAS - Archive-It: https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArc hive-It.pdf?version=1&modificationDate=1422304077000&api=v2


Download ppt "WAS to Archive-It Metadata Migration March 11, 2015."

Similar presentations


Ads by Google